Senior Backend Engineer for Machine Learning Infrastructure and Reliability
Design, build, and operate production Django services that orchestrate distributed ML workflows. Build high-throughput async job processing systems and implement reliability patterns. Collaborate with ML teams to productionise training and inference pipelines.
Key Highlights
Key Responsibilities
Technical Skills Required
Benefits & Perks
Nice to Have
Job Description
⚙️ Senior Backend Engineer – ML Infrastructure & Reliability
📍 Remote (CET) | Full-Time
A high-growth AI technology company is building large-scale machine learning platforms that power content generation for global enterprise brands. Their production systems coordinate high-throughput ML inference across multiple services and external providers.
They are hiring a Senior Backend Engineer to take ownership of reliability, orchestration, and performance across their core backend platform.
💻 The Role
You will design, build, and operate production Django services that orchestrate distributed ML workflows. The focus is on building highly reliable backend systems capable of handling asynchronous processing at scale.
🛠 Key Responsibilities
• Design and maintain Django services supporting ML inference workflows
• Build high-throughput async job processing systems using queues and schedulers
• Implement reliability patterns including retries, idempotency, rate limiting, and backpressure
Interested in remote work opportunities in Development & Programming? Discover Development & Programming Remote Jobs featuring exclusive positions from top companies that offer flexible work arrangements.
• Own observability strategy including metrics, tracing, logging, and alerting
• Lead incident response and drive long-term reliability improvements
• Collaborate with ML teams to productionise training and inference pipelines
• Support CI/CD and infrastructure automation using Infrastructure as Code
✅ Requirements
• Strong Python backend engineering background
• Proven experience running Django applications in production
• Experience building asynchronous processing systems (Celery, RQ, Arq or similar)
• Solid understanding of distributed systems reliability principles
• Experience with AWS or GCP cloud environments
• Practical Infrastructure as Code experience (Terraform or similar)
Browse our curated collection of remote jobs across all categories and industries, featuring positions from top companies worldwide.
⭐ Nice To Have
• Experience operating ML infrastructure or MLOps platforms
• Familiarity with orchestration tools (Airflow, Temporal, Prefect, Step Functions)
• Experience with observability stacks such as Prometheus, Grafana, or OpenTelemetry
• Experience scaling Postgres or caching systems like Redis
🌟 Why Join
• Own reliability of business-critical AI production systems
• Solve complex distributed systems challenges
• Work closely with ML and backend engineering teams
• Fully remote within CET time zone
• High autonomy and strong technical ownership
📩 If you are interested, message me directly or apply via LinkedIn.
Similar Jobs
Explore other opportunities that match your interests
DevSecOps Engineer
Deel
Executive Product Lead for Brain, Flow, and Link
pencil
Senior Full Stack Engineer