AI Summary
Join a fast-moving engineering team to build production-grade LLM-powered services and RAG pipelines. Design, implement, and operate scalable retrieval, embedding, and inference pipelines for enterprise customers.
Key Highlights
Design and implement end-to-end RAG workflows
Develop robust Python services integrating Transformers-based models and vector search
Optimize embedding strategies, retrieval quality, and prompt templates
Technical Skills Required
Benefits & Perks
Fully remote role with flexible hours
Outcomes-driven culture
Mentorship-oriented environment
Job Description
Primary Job Title: Machine Learning Engineer — LLM & RAG
Industry: Enterprise AI / Software & Cloud Solutions. Sector: Large Language Model (LLM) applications, Retrieval-Augmented Generation (RAG), and production ML services for business workflows. Location: India (Remote).
About The Opportunity
Join a fast-moving engineering team building production-grade LLM-powered services and RAG pipelines that enable intelligent search, document understanding, and agentic automation for enterprise customers. You will design, implement, and operate scalable retrieval, embedding, and inference pipelines—turning research-grade models into reliable, low-latency products.
Role & Responsibilities
- Design and implement end-to-end RAG workflows: document ingestion, embedding generation, vector indexing, retrieval, and LLM inference.
- Develop robust Python services that integrate Transformers-based models, LangChain pipelines, and vector search (FAISS/Milvus) for production APIs.
- Optimize embedding strategies, retrieval quality, and prompt templates to improve relevance, latency, and cost-efficiency.
- Build scalable inference stacks with serving, batching, caching, and monitoring to meet SLA targets for throughput and latency.
- Collaborate with data scientists and product teams to evaluate model architectures, run A/B tests, and implement continuous retraining/validation loops.
- Implement observability, CI/CD, and reproducible deployments (Docker-based containers, model versioning, and automated tests).
Must-Have
- 4+ years of professional experience in ML or software engineering with hands-on LLM/RAG work.
- Strong Python programming and system-design skills for production services.
- Experience with Transformers-based models and fine-tuning/inference workflows.
- Proven experience building retrieval pipelines using vector search (FAISS, Milvus) and embeddings.
- Familiarity with LangChain or equivalent orchestration libraries for LLM workflows.
- Practical experience containerizing and deploying ML workloads (Docker, CI/CD, basic infra automation).
- Experience with cloud ML infra (AWS, Azure or GCP) and model serving at scale.
- Familiarity with Kubernetes or other orchestration for production deployments.
- Experience with retrieval evaluation, relevance metrics, and A/B experimentation.
- Fully remote role with flexible hours and an outcomes-driven culture.
- Opportunity to ship end-to-end LLM products and influence architecture choices.
- Mentorship-oriented environment with access to modern tools and model stacks.
Skills: python,backend,rag,llm