We are seeking a Senior Machine Learning Engineer to build and operate production-grade, cloud-native machine learning and LLM-powered systems for our Digital's Intelligence Cloud. The successful candidate will have strong Python-based machine learning and API engineering skills, with a proven history of shipping well-tested, production-grade systems. The role requires strong collaboration skills, working with data scientists, backend engineers, and architects.
Key Highlights
Key Responsibilities
Technical Skills Required
Benefits & Perks
Job Description
Data/ML Engineer:
The machine learning engineer role focuses on building and operating production-grade, cloud-native machine learning and LLM-powered systems for our Digital’s Intelligence Cloud, an AI-powered SaaS platform running on AWS.
Successful candidates will have strong Python-based machine learning and API engineering skills, with a proven history of shipping well-tested, production-grade systems. Strong working knowledge of AWS and CI/CD pipelines is required, along with hands-on exposure to Docker and Kubernetes. Experience engineering, deploying, monitoring, and operating ML and LLM-based systems using MLflow (mandatory) and modern ML frameworks is required.
Experience Level:
Experienced candidates preferred with 4–5 years of overall software engineering experience, including at least 2 years shipping production-grade Python systems.
Core Skills:
- Self-driven with strong ownership mindset; comfortable working under ambiguity and evolving requirements
- Strong collaboration skills, working with data scientists, backend engineers, and architects
Required Skills:
Interested in remote work opportunities in Development & Programming? Discover Development & Programming Remote Jobs featuring exclusive positions from top companies that offer flexible work arrangements.
- Strong production-grade Python skills with ability to write clean, modular, testable code.
- Strong API engineering skills, including development of RESTful APIs using FastAPI and/or Flask.
- Hands-on experience building APIs backed by Elasticsearch indexes, including search and retrieval workflows.
- Experience delivering Python services using CI/CD pipelines, with strong coding standards, automated testing, and version control.
- Strong understanding of build, test, release, and packaging practices for Python applications.
- Strong practical understanding of machine learning concepts, including model training, validation, and evaluation, with hands-on experience engineering, deploying, scaling, and operating models built by data scientists in production environments.
- Mandatory experience with MLflow for experiment tracking, model versioning, and lifecycle management
- Good understanding of large language models (LLMs) and how inference APIs work, with hands-on experience using AWS Bedrock, OpenAI, or Hugging Face APIs
- Exposure to agentic AI systems (e.g., multi-step reasoning, tool usage, orchestration, memory) is required; candidates are expected to be able to productionize and operate such systems
- Working knowledge of LLM orchestration frameworks such as LangChain/Langraph
- Hands-on exposure to Docker and Kubernetes for deploying and operating ML and LLM services
- Working knowledge of Apache Spark for distributed data processing
- Experience with data engineering and ETL workflows to prepare datasets for machine learning
- Required working knowledge of common ML and data processing libraries such as Pandas, NumPy, Scikit-Learn, TensorFlow / Keras or PyTorch
- Knowledge of a strongly typed language such as Java, C# or Rust in addition to Python.
Browse our curated collection of remote jobs across all categories and industries, featuring positions from top companies worldwide.
Core responsibilities:
- Work with globally distributed data science, data engineering, frontend, and backend teams to deliver ML and LLM systems in production
- Design, develop, and deploy backend APIs for ML and LLM inference using Python
- Collaborate closely with data scientists while owning productionization, deployment, and operational stability of ML and LLM services
- Design and implement agentic AI systems, including multi-step workflows, tool invocation, orchestration, and production-readiness, in collaboration with MLOps teams
- Write and maintain production-grade Python code supporting ML, LLM, and search-driven workloads on AWS
- Design systems with attention to inference latency, scalability, reliability, and operational cost
- Ensure strong unit test coverage and support QA teams in building automated test strategies
- Maintain clear, current technical documentation for owned systems
- Deploy ML models and LLM services as RESTful APIs and/or event-driven services
- Contribute to model monitoring, including experiment tracking, inference logging, metrics, and performance analysis
Job Location:
Bangalore (Remote)
Similar Jobs
Explore other opportunities that match your interests
woolf
kapariai