Senior MLOps Engineer

zerotoone.ai • India

Remote

Apply

AI Summary

Design and operate scalable cloud infrastructure for a large behavioral model, handling millions of predictions daily. Collaborate with ML engineers and product teams on model performance, retraining cadence, and infrastructure. Optimize cloud resource usage and implement automation for model lifecycle management.

Key Highlights

Design and operate scalable cloud infrastructure for a large behavioral model

Collaborate with ML engineers and product teams on model performance and retraining

Optimize cloud resource usage and implement automation for model lifecycle management

Key Responsibilities

Build and maintain training and inference pipelines for the LBM

Operate scalable cloud infrastructure on AWS and GCP for training, deployment, and monitoring

Optimize cloud resource usage — cost-effective scaling without sacrificing availability or performance

Technical Skills Required

Docker Kubernetes Python PyTorch Terraform Prometheus Grafana

Benefits & Perks

Full-time, fully remote within India

Salary: ₹25-40 LPA depending on experience

Overlap with US hours: 3-4 hours with US Eastern, typically evenings IST

Nice to Have

Production experience with both AWS and GCP

Experience serving transformer-based models at scale

Distributed training experience

Job Description

The work

We run a Large Behavioral Model — custom transformer-based architecture — that generates billions of spatiotemporal predictions daily for Fortune 500 clients. The model is retrained on a regular cadence and serves predictions at scale.

You'll own the infrastructure that makes that possible. ML pipelines, cloud infrastructure across AWS and GCP, CI/CD for models, cost management, and monitoring. You'll work with ML engineers who depend on your systems being reliable, reproducible, and fast.

What you'll do

Build and maintain training and inference pipelines for the LBM, handling millions of predictions daily
Operate scalable cloud infrastructure on AWS and GCP for training, deployment, and monitoring
Optimize cloud resource usage — cost-effective scaling without sacrificing availability or performance
Maintain CI/CD pipelines specifically for ML models, with proper dev / staging / prod separation
Implement automation for model lifecycle management, retraining, and data pipeline orchestration
Instrument model monitoring — performance, drift, latency, resource utilization — and wire up alerting
Collaborate with ML engineers and product teams on model performance, retraining cadence, and infrastructure
Document pipelines, deployments, and on-call runbooks
Evaluate new MLOps tooling and techniques, and apply them where they measurably improve the platform

Interested in remote work opportunities in Devops? Discover Devops Remote Jobs featuring exclusive positions from top companies that offer flexible work arrangements.

What we need

Bachelor's, Master's, or PhD in CS, Engineering, AI, or a related field
3-5 years of hands-on experience in MLOps, ML platform, or ML infrastructure
Production experience deploying, managing, and scaling ML workloads on AWS or GCP (SageMaker, Vertex AI, EC2, GKE, EKS)
Strong proficiency with Docker, Kubernetes, and container orchestration
Experience designing and operating CI/CD pipelines for ML models
Performance tuning and cost optimization experience in cloud environments
Strong programming skills in Python
Working experience with PyTorch (or TensorFlow) in production
Solid understanding of machine learning concepts, including deep learning and transformer-based architectures
Strong written English and comfort working with a US-based team across time zones

Nice to have

Production experience with both AWS and GCP

Browse our curated collection of remote jobs across all categories and industries, featuring positions from top companies worldwide.

Experience serving transformer-based models at scale (TorchServe, Triton, vLLM, or similar)
Distributed training experience (DDP, FSDP, Ray, or similar)
Terraform or other infrastructure-as-code
Feature store experience (Feast, Vertex AI Feature Store, SageMaker Feature Store, or similar)
Prometheus, Grafana, or similar observability stacks
Prior experience in a fast-paced startup environment
AdTech, MarTech, or consumer behavioral data domains

What we offer

Full-time, fully remote within India
Salary: ₹25-40 LPA depending on experience
Overlap with US hours: 3-4 hours with US Eastern, typically evenings IST
Hands-on work with a modern ML stack at petabyte scale
Direct collaboration with ML engineers and technical leads
Clear growth path with increasing ownership over time

Job Overview

Posted Date Apr 24, 2026

Employment Type Full-time

Experience Level Mid-Senior level

Location India

Annual Salary 0 - 40 INR

Category Devops

Company zerotoone.ai

Mentioned Skills

Industries

Similar Jobs

Explore other opportunities that match your interests

Customer Success Engineer

Devops

•

17h ago

Premium Job

•••••• •••••• ••••••

Job Type ••••••

Experience Level ••••••

GitLab

India

Senior AI/ML Operations Engineer

Devops

•

17h ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Mid-Senior level

Jobgether

India

Senior Cloud Infrastructure Engineer

Devops

•

2d ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Mid-Senior level

stackway

India

Senior MLOps Engineer

Key Highlights

Key Responsibilities

Technical Skills Required

Benefits & Perks

Nice to Have

Job Description

Job Overview

Mentioned Skills

Industries

Similar Jobs

Customer Success Engineer

Premium Job

GitLab

Senior AI/ML Operations Engineer

Jobgether

Senior Cloud Infrastructure Engineer

stackway

Subscribe our newsletter