Architect – AML Engine 3216

San Jose, US-United States
Posted 2 weeks ago
About The Company

This company pioneers short-form video creation and social engagement, boasting a vast, engaged user base. Its platform empowers users with creative tools, filters, and effects. With a diverse content ecosystem, it’s a hub of creativity and expression. The proprietary algorithm ensures personalized content feeds, enhancing user engagement and satisfaction. This company wields significant influence on digital media, making it an invaluable partner for innovative collaborations and marketing endeavors.


About the team

The mission of our AML (Applied Machine Learning) team is to push the next-generation AI infrastructure and Recommendation platform for the Ads ranking, Feed ranking, Search ranking, Live & ecom ranking in our company. We also drive substantial impact on all core businesses of the company. Currently, we are closely collaborating with business and algorithm teams to build the next generation of Recommendation Models with tremendous scaling, which brings significant challenges and opportunities to machine learning infrastructure and systems. So we are looking for Machine Learning Infra Talents to join our team to support and advance that mission.


Responsibilities

– Enable the model scaling capability up to several orders of magnitude through the next generation of ML Infra.
– Support core business and algorithm teams in adopting the scaling capability into world-class scenarios.
– Identify key infra challenges/opportunities of scaling and collaborate closely with sister teams to deliver them.
– Keep abreast of business requirements and industry trends, and maintain the SOTA scaling of our ML infra.
– Improve userbility, resource efficiency and service stability while pushing the scaling to unlimited.


Qualifications

– Bachelors Degree in Software Engineering or similar.
– At least 5 years of experience in building Recommendation/Ads/LLM systems at large scale.
– Expert in one or more of the following fields: DP/TP/PP/EP/SP parallelism strategies, DeepSpeed/Megatron-LM parallel libraries, TensorFlow/PyTorch/JAX frameworks, GPU/TPU accelerators, XLA/TVM/MLIR/Trion compilers, large-scale scheduling/runtime for ML workload, hardware-software co-design and optimization.


Preferred

– Understanding of STOA Recommendation modeling technologies is a big plus.
– A successful story or unique achievement is a big plus.
– Possess excellent logical analysis ability, able to perform reasonable abstraction and decomposition of a complex system design.
– Master the principle of distributed systems and participate in the design, development and maintenance of large-scale distributed systems.
– Have a strong sense of responsibility, good learning ability, communication ability and self-motivation, and be able to respond and act quickly.
– Have good working document habits, and write and update technical documents in a timely manner as required.

Job Features

Job CategoryAI Engineering
SenioritySenior IC / Tech Lead
Base Salary$280,000 - $430,000
Recruiteryaxin.fan@ocbridge.ai

Apply Online