Staff Site Reliability Engineer, AI Infrastructure

twelvelabs • United State

Visa Sponsorship

Apply

AI Summary

Staff SRE to own the reliability, scalability, and operability of AI/ML infrastructure for multimodal foundation models. Requires deep operational instincts, strong debugging, and cloud infrastructure experience. Focus on production health, observability, and incident response for core AI products.

Key Highlights

Own production reliability end-to-end for AI/ML infrastructure.

Partner with product engineering teams to ensure service reliability by design.

Design and operate cloud infrastructure supporting AI/ML workloads.

Key Responsibilities

Own production reliability end to end — from deployment through monitoring, incident response, and postmortem-driven improvement.

Partner with the product engineering teams to ensure their services are reliable, observable, and operable by design.

Build and maintain observability systems (metrics, logging, tracing, alerting) that give the team clear signal on system health and performance.

Design and operate cloud infrastructure supporting AI/ML workloads.

Drive incident response — detect, diagnose, mitigate, and prevent production issues.

Build the runbooks, automation, and guardrails that reduce mean time to recovery.

Identify and eliminate toil through automation, self-healing systems, and better tooling.

Technical Skills Required

AWS Kubernetes Prometheus Grafana Loki Terraform Ansible

Benefits & Perks

Full health, dental, and vision benefits

Extremely flexible PTO and parental leave policy

Office closed the week of Christmas and New Years

Monthly wellness stipend

Annual Learning & Development stipend

Global offices in San Francisco and Seoul, and coworking office memberships for remote team members

VISA support where applicable

Transportation stipend

Daily lunch & dinner provided

Job Description

Who We Are:

At TwelveLabs, we are pioneering the development of cutting-edge multimodal foundation models that have the ability to comprehend videos just like humans do. Our models have redefined the standards in video-language modeling, empowering us with more intuitive and far-reaching capabilities, and fundamentally transforming the way we interact with and analyze various forms of media.

With a remarkable $107 million in Seed and Series A funding, our company is backed by top-tier venture capital firms such as NVIDIA’s NVentures, NEA, Radical Ventures, and Index Ventures, and prominent AI visionaries and founders such as Fei-Fei Li, Silvio Savarese, Alexandr Wang, and more. Headquartered in San Francisco, with an influential APAC presence in Seoul, our global footprint underscores our commitment to driving worldwide innovation.

We are a global company that values the uniqueness of each person’s journey. It is the differences in our cultural, educational, and life experiences that allow us to constantly challenge the status quo. We are looking for individuals who are motivated by our mission and eager to make an impact as we push the bounds of technology to transform the world. Join us as we revolutionize video understanding and multimodal AI.

About The Role

As a Staff Site Reliability Engineer at Twelve Labs, you will own the reliability, scalability, and operability of the infrastructure that powers our multimodal foundation models. You'll be hands-on — building systems when needed, but with a primary focus on ensuring production stays healthy, observable, and resilient.

You'll work most closely with the product teams in the US, supporting the infrastructure behind our core AI products. This role requires deep operational instincts, strong debugging skills, and the ability to balance long-term reliability investments against the pace of an early-stage AI company.

In this role, you will

Own production reliability end to end — from deployment through monitoring, incident response, and postmortem-driven improvement.
Partner with the product engineering teams to ensure their services are reliable, observable, and operable by design.

Searching for Devops roles that provide visa sponsorship? Connect with international employers through Devops Jobs with Visa Sponsorship opportunities actively seeking talented professionals.

Build and maintain observability systems (metrics, logging, tracing, alerting) that give the team clear signal on system health and performance.
Design and operate cloud infrastructure supporting AI/ML workloads.
Drive incident response — detect, diagnose, mitigate, and prevent production issues. Build the runbooks, automation, and guardrails that reduce mean time to recovery.
Identify and eliminate toil through automation, self-healing systems, and better tooling.

You may be a good fit if you have:

7+ years of experience operating production infrastructure systems, not just building them.
Strong hands-on experience with AWS, Kubernetes in production environments.

Explore our comprehensive directory of visa sponsorship jobs from employers worldwide who are ready to sponsor talented international professionals.

Solid fundamentals in OS internals, networking, storage, and compute — the kind that help you debug a problem at 3am without documentation.
Deep practical experience with observability (Prometheus/Grafana/Loki or equivalent), Infrastructure as Code (Terraform, Ansible), and CI/CD.
Track record of owning services end to end — deployment, monitoring, incident response, and postmortem follow-through.

Interviews

All virtual interviews will be conducted via video. To support identity verification and interview integrity, candidates may be asked to present a government-issued ID. Candidates may also be requested to disable video filters and use a clear, unobstructed background to facilitate effective communication.

Benefits And Perks

🤝 An open and inclusive culture and work environment

Interested in opportunities specifically in United State? Discover our dedicated Visa Sponsorship Jobs in United State page featuring roles from top employers in this location.

🚀 Work closely with a collaborative, mission-driven team on cutting-edge AI technology

🏥 Full health, dental, and vision benefits

🌴 Extremely flexible PTO and parental leave policy. Office closed the week of Christmas and New Years

💪 Monthly wellness stipend

📚 Annual Learning & Development stipend to invest in your growth

💼 Global offices in San Francisco and Seoul, and coworking office memberships for remote team members

🛂 VISA support where applicable

🚆 Transportation stipend

🍲 Daily lunch & dinner provided

Compensation Range: $220K - $250K

Job Overview

Posted Date May 29, 2026

Employment Type Full-time

Experience Level Mid-Senior level

Location United State

Annual Salary 220,000 - 250,000 USD

Category Devops

Company twelvelabs

Mentioned Skills

Similar Jobs

Explore other opportunities that match your interests

AI Field Engineer (Enterprise)

Devops

•

12h ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Entry level

world hr services

United State

AI Field Engineer – Enterprise (Remote, US-Based)

Devops

•

12h ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Entry level

medilinkers llc

United State

Senior AI Field Engineer (Enterprise)

Devops

•

17h ago

Premium Job

•••••• •••••• ••••••

Job Type ••••••

Experience Level ••••••

medilinkers llc

United State

Staff Site Reliability Engineer, AI Infrastructure

Key Highlights

Key Responsibilities

Technical Skills Required

Benefits & Perks

Job Description

Job Overview

Mentioned Skills

Industries

Similar Jobs

AI Field Engineer (Enterprise)

world hr services

AI Field Engineer – Enterprise (Remote, US-Based)

medilinkers llc

Senior AI Field Engineer (Enterprise)

Premium Job

medilinkers llc

Subscribe our newsletter