San Jose, US-United States, Seattle, US-United States
Posted 5 days ago
About The Company This company pioneers short-form video creation and social engagement, boasting a vast, engaged user base. Its platform empowers users with creative tools, filters, and effects. With a diverse content ecosystem, it’s a hub of creativity and expression. The proprietary algorithm ensures personalized content feeds, enhancing user engagement and satisfaction. This company wields significant influence on digital media, making it an invaluable partner for innovative collaborations and marketing endeavors. Team Introduction The AML Machine Learning Systems team provides E2E machine learning experience and machine learning resources for the company. The team builds heterogeneous ML training and inference systems based on GPU and AI chips and advances the state-of-the-art of ML systems technology to accelerate models such as stable diffusion and LLM. The team is also responsible for research and development of hardware acceleration technologies for AI and cloud computing, via technologies such as distributed systems, HPC, and RDMA networking. The team is reinventing the ML infra for large scale language models. We have published papers at top tier conferences such as SIGCOMM, NSDI, EuroSys, OSDI, SOSP, MLSys, NeurIPS, etc. Responsibilities – Participating in online architecture design and optimization centered around deep model inference tasks, achieving high concurrency and throughput in large-scale online systems. – Participating in the establishment of a comprehensive system covering stability, disaster recovery, R&D efficiency, and cost, enhancing overall system stability. – Participating in the design and implementation of end-to-end online pipeline systems with multiple models, plugins, and storage-computation components, enabling agile, flexible, and observable continuous delivery. – Collaborating closely with the MLE for optimization of algorithms and systems. – Being proactive, optimistic, highly responsible, and demonstrating meticulous work ethic, as well as possessing strong team communication and collaboration skills. Minimum Qualifications – Experience in backend or infrastructure engineering, focusing on scalable APIs, distributed systems, or real-time data pipelines. – Hands-on experience building reusable infrastructure for multi-channel applications. – Strong expertise in high-performance computing, microservices, and cloud-native architectures. – Familiarity with leveraging LLMs, NLP, and common machine learning techniques for customer engagement and personalization. – Strong problem-solving mindset, with experience driving technical execution in cross-functional teams. Minimum Qualifications – Excellent coding skills, strong understanding of data structures, and fundamental knowledge of algorithms. Proficiency in programming languages such as C/C++, Java, Go, Python, etc. – Familiarity with deep learning models and its applications, such as ResNet, BERT, etc. – Rich experience in online architecture, with the ability to troubleshoot independently. – Strong sense of responsibility, good learning ability, communication skills, and self-motivation. Preferred Qualifications – Experience in the architecture of recommendation/advertising/search online and offline systems. – Understanding of GPU hardware architecture, familiarity with GPU software stack (CUDA, cuDNN), and experience in GPU performance analysis. |
Job Features
Job Category | AI Engineering |
Seniority | Junior / Mid IC |
Base Salary | $120,000 - $310,000 |
Recruiter | loonat@ocbridge.ai |