Core & ML Ops Team Lead

Jobgether • India
Remote
Apply
AI Summary

Lead a cross-functional team in designing platforms for model training, orchestration, deployment, and monitoring. Ensure high performance, reliability, and security. Collaborate with product, operations, and security teams.

Key Highlights
Lead the Core & MLOps team
Design and maintain scalable infrastructure for model training, serving, and monitoring
Collaborate with product, operations, and security teams
Key Responsibilities
Lead the Core & MLOps team, overseeing roadmap, prioritization, delivery, and mentoring
Design, develop, and maintain scalable infrastructure for model training, serving, and monitoring
Build and maintain the Golden Path: reference repositories, scaffold CLIs, CI/CD pipelines, runtime contracts, and production-ready defaults
Technical Skills Required
Linux/OS internals Networking Concurrency Performance profiling Kubernetes GPU infrastructure management Java Rust Go C++ Python CI/CD SRE practices Observability Reliability enablement
Benefits & Perks
Flexible remote work environment
Exposure to cutting-edge open-source technologies and ML infrastructure
Collaborative, multi-cultural team fostering innovation and knowledge sharing
Nice to Have
Experience with streaming/workflow tools (Kafka, Argo, Temporal, Airflow)
Hands-on work with eBPF observability, perf tooling, or io_uring
Expertise in ML/AI cost optimization, multi-tenant quotas, and fairness

Job Description


This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Core & ML Ops Team Lead in India.

This role is ideal for an experienced technical leader in MLOps and distributed systems, responsible for building and maintaining the scalable infrastructure that supports mission-critical services. You will lead a cross-functional team in designing platforms for model training, orchestration, deployment, and monitoring while ensuring high performance, reliability, and security. The position combines hands-on engineering with strategic team leadership, driving adoption of best practices, automation, and observability across the organization. You will collaborate with product, operations, and security teams to implement robust platforms that empower engineers to build and deploy services confidently. Mentorship, knowledge sharing, and establishing production-ready standards are central to your impact. This role allows you to shape platform strategy while staying deeply engaged in cutting-edge technologies and ML operations at scale.

Accountabilities

  • Lead the Core & MLOps team, overseeing roadmap, prioritization, delivery, and mentoring
  • Design, develop, and maintain scalable infrastructure for model training, serving, and monitoring
  • Build and maintain the Golden Path: reference repositories, scaffold CLIs, CI/CD pipelines, runtime contracts, and production-ready defaults
  • Operate secure, multi-tenant model registries and orchestration platforms with standardized experiment and evaluation frameworks
  • Integrate AI/ML capabilities as managed platform services with cost and governance controls
  • Collaborate with product engineering, operations, and security teams on adoption and rollout plans
  • Promote best practices in observability, reliability, cost governance, and platform standardization

Requirements

  • 5+ years building distributed systems; 3+ years in MLOps or ML platform engineering
  • Strong knowledge of Linux/OS internals, networking, concurrency, and performance profiling
  • Deep expertise in Kubernetes (bonus: Mesos) and GPU infrastructure management
  • Proficiency in high-performance programming (Java, Rust, Go, C++; strong Python skills)
  • Experience designing and operating production model platforms (registry, training, serving, monitoring)
  • Proven experience leading technical teams and implementing organization-wide platform solutions
  • Familiarity with CI/CD, SRE practices, observability, and reliability enablement
  • Strong collaboration, mentoring, and communication skills

Preferred

  • Experience with streaming/workflow tools (Kafka, Argo, Temporal, Airflow)
  • Hands-on work with eBPF observability, perf tooling, or io_uring
  • Expertise in ML/AI cost optimization, multi-tenant quotas, and fairness
  • Experience authoring Golden Paths (service templates, CI/CD blueprints, scaffolds)

Benefits

  • Flexible remote work environment, fully distributed globally
  • Exposure to cutting-edge open-source technologies and ML infrastructure
  • Collaborative, multi-cultural team fostering innovation and knowledge sharing
  • Freedom to shape platform architecture and engineering practices
  • Opportunities for career growth and technical leadership impact

Why Apply Through Jobgether?

We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.

We appreciate your interest and wish you the best!

Why Apply Through Jobgether?

Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.

We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.


Similar Jobs

Explore other opportunities that match your interests

Senior Software Engineer

Programming
•
3h ago
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Mid-Senior level

woolf

India

Part-Time Software Engineer

Programming
•
1d ago
Visa Sponsorship Relocation Remote
Job Type Part-time
Experience Level Entry level

kapariai

India

Senior Machine Learning Engineer

Programming
•
1d ago
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Entry level

researchfox consulting

India

Subscribe our newsletter

New Things Will Always Update Regularly