Production System Engineer 3025

San Jose, US-United States
Posted 2 months ago
About The Company

This company pioneers short-form video creation and social engagement, boasting a vast, engaged user base. Its platform empowers users with creative tools, filters, and effects. With a diverse content ecosystem, it’s a hub of creativity and expression. The proprietary algorithm ensures personalized content feeds, enhancing user engagement and satisfaction. This company wields significant influence on digital media, making it an invaluable partner for innovative collaborations and marketing endeavors.


Responsibilities

– Operation: As a Production Systems Engineer, your mission is to contribute to enhancing the quality, reliability, efficiency, effectiveness, and scalability of our data center and Cloud operations, platform, and service on a worldwide scale.
– Lifecycle Improvement: Engage in and improve the whole lifecycle of Infrastructure systems – from system design consulting through to launch reviews, deployment, operation, and refinement.
– Automation: Deliver tools and solutions to improve the automation, reliability, scalability, and operability of services.
– Monitoring: Deliver tools and solutions to improve monitor availability, latency, and overall service, server and Cloud infrastructure and network health.
– Disaster Recovery: Troubleshoot and resolve complex technical issues in a high-pressure, time-sensitive environment. Conduct high-level root-cause analysis for service interruption and establish preventive measures. Practice sustainable incident response and postmortem.
– Cross-team Collaboration: Partner with stakeholders like infrastructure architects, project managers, data center operations engineers, platform developers, supply chain teams, and our internal customers to understand overarching business objectives. You will also have the opportunity to design and implement innovative solutions for our Core IDCs and CDN/Edge and Cloud Services.
– Technical Documentation: Create and maintain standard operating procedures and knowledge bases.
– On-call: Participate in our on-call across continents and incident response teams to solve critical problems in production.


Minimum Qualifications

1. Education: Bachelor’s degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience.
2. Experience: Minimal 3 years of experience in systems infrastructure operations or related fields, working with data center or CDN production systems and system design/validation.
3. Server Hardware: We seek individuals with more than just a basic understanding. You should be at an intermediate level or higher, where your hands-on experience in labs or data centers has forged a deep connection with server architecture.
4. Data Center: An intermediate level of expertise is preferred here. We’re on the lookout for those who are well-versed in the intricate details of operations, from small things like OS installations and break-fix to high-impact projects like planning and operations (covering the full infrastructure lifecycle) to the new design-build facilities or renovations to existing systems.
5. Monitoring: Your knowledge should transcend the ordinary; we prefer intermediate-level skills. We expect you to be a maestro in the orchestration of tools and designs for monitoring server health, network switches, and the power and temperature conditions of the data center.
6. Automation: We welcome those who have delved into the realm of automation, ideally at an intermediate level. Your qualifications should reflect at least one automation project, showcasing your commitment to streamlining processes.
7. Linux: In the realm of Linux, we are in search of individuals with intermediate-level proficiency. Your mastery of this operating system should shine brightly.
8. Coding: As you navigate the digital landscape, fluency in Bash, Python, and Golang is strongly favored. Your coding skills will be your trusty companions on this adventure.
9. Network: When it comes to networks, we’re seeking at least a junior-level understanding. Your ability to chart the course through the network labyrinth is essential.
10. Communication: Experience managing and coordinating teams in the global environment.
11. Project Management: Experience in the preparation of project plans and specifications, drafting scopes of work, and managing multiple projects simultaneously.
12. Experience in Agile methodologies (e.g., Kanban, Scrum) with experience in user stories, sprint planning, and backlog management.
13. Preferred But Not Required Skills: Golang, REST APIs, Gin, Ansible, Load Balancer, SQL, Hive, Hadoop, Clickhouse, Message Queue, Redis.

Job Features

Job CategoryBackend
SeniorityJunior / Mid IC
Base Salary$110,000 - $220,000
Recruiterchristopher.nguyen@ocbridge.ai

Apply Online