Lead a cross-functional team in designing platforms for model training, orchestration, deployment, and monitoring. Ensure high performance, reliability, and security. Collaborate with product, operations, and security teams.
Key Highlights
Key Responsibilities
Technical Skills Required
Benefits & Perks
Nice to Have
Job Description
This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Core & ML Ops Team Lead in India.
This role is ideal for an experienced technical leader in MLOps and distributed systems, responsible for building and maintaining the scalable infrastructure that supports mission-critical services. You will lead a cross-functional team in designing platforms for model training, orchestration, deployment, and monitoring while ensuring high performance, reliability, and security. The position combines hands-on engineering with strategic team leadership, driving adoption of best practices, automation, and observability across the organization. You will collaborate with product, operations, and security teams to implement robust platforms that empower engineers to build and deploy services confidently. Mentorship, knowledge sharing, and establishing production-ready standards are central to your impact. This role allows you to shape platform strategy while staying deeply engaged in cutting-edge technologies and ML operations at scale.
Accountabilities
- Lead the Core & MLOps team, overseeing roadmap, prioritization, delivery, and mentoring
- Design, develop, and maintain scalable infrastructure for model training, serving, and monitoring
- Build and maintain the Golden Path: reference repositories, scaffold CLIs, CI/CD pipelines, runtime contracts, and production-ready defaults
- Operate secure, multi-tenant model registries and orchestration platforms with standardized experiment and evaluation frameworks
- Integrate AI/ML capabilities as managed platform services with cost and governance controls
- Collaborate with product engineering, operations, and security teams on adoption and rollout plans
- Promote best practices in observability, reliability, cost governance, and platform standardization
- 5+ years building distributed systems; 3+ years in MLOps or ML platform engineering
- Strong knowledge of Linux/OS internals, networking, concurrency, and performance profiling
- Deep expertise in Kubernetes (bonus: Mesos) and GPU infrastructure management
- Proficiency in high-performance programming (Java, Rust, Go, C++; strong Python skills)
- Experience designing and operating production model platforms (registry, training, serving, monitoring)
- Proven experience leading technical teams and implementing organization-wide platform solutions
- Familiarity with CI/CD, SRE practices, observability, and reliability enablement
- Strong collaboration, mentoring, and communication skills
Interested in remote work opportunities in Development & Programming? Discover Development & Programming Remote Jobs featuring exclusive positions from top companies that offer flexible work arrangements.
- Experience with streaming/workflow tools (Kafka, Argo, Temporal, Airflow)
- Hands-on work with eBPF observability, perf tooling, or io_uring
- Expertise in ML/AI cost optimization, multi-tenant quotas, and fairness
- Experience authoring Golden Paths (service templates, CI/CD blueprints, scaffolds)
Browse our curated collection of remote jobs across all categories and industries, featuring positions from top companies worldwide.
- Flexible remote work environment, fully distributed globally
- Exposure to cutting-edge open-source technologies and ML infrastructure
- Collaborative, multi-cultural team fostering innovation and knowledge sharing
- Freedom to shape platform architecture and engineering practices
- Opportunities for career growth and technical leadership impact
We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.
We appreciate your interest and wish you the best!
Why Apply Through Jobgether?
Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.
We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.
Similar Jobs
Explore other opportunities that match your interests
woolf
kapariai