Software Engineer, Inference 3104

San Jose, US-United States
Posted 2 months ago
About The Company

This company pioneers short-form video creation and social engagement, boasting a vast, engaged user base. Its platform empowers users with creative tools, filters, and effects. With a diverse content ecosystem, it’s a hub of creativity and expression. The proprietary algorithm ensures personalized content feeds, enhancing user engagement and satisfaction. This company wields significant influence on digital media, making it an invaluable partner for innovative collaborations and marketing endeavors.

Our team was established to help realize our company vision, building a global platform for creation and communication. We are doing world-class work in machine learning, computer vision, natural language processing, speech and audio, and knowledge, and transferring our work into products, which hundreds of millions of users worldwide use. As a vital AI infrastructure for the company, our machine learning system integrates our most up-to-date R&D results in AI algorithms and systems. Come and join us, you will get the chance of building large-scale machine learning systems and working with the best AI system and algorithm researchers and engineers.


What You’ll Be Doing

1. Responsible for developing and optimizing LLM inference framework.
2. Responsible for GPU and CUDA Performance optimization to create an industry-leading high-performance LLM inference engine.


Minimum Qualifications

– Bachelor’s degree or above, major in computer/electronics/automation/software, etc., with experience in ML engineering optimization preferred
– Proficient in C/C++, proficient in algorithms and data structures, familiar with Python
– Proficient in GPU high-performance computing optimization technology on CUDA, in-depth understanding of computer architecture, familiar with parallel computing optimization, memory access optimization, low-bit computing, etc.
– Understand the basic principles of deep learning algorithms, be familiar with the basic architecture of neural networks and understand deep learning training frameworks such as Pytorch and TensorFlow
– Familiar with TensorRT-LLM, ORCA, VLLM, etc.
– Knowledge of LLM models, experience in accelerating LLM model optimization is preferred

Job Features

Job CategoryAI Engineering
SeniorityJunior / Mid IC
Base Salary$120,000 - $310,000
Recruiterluna.zheng@ocbridge.ai

Apply Online