Senior Backend Engineer for Machine Learning Infrastructure and Reliability

Remote
Apply
AI Summary

Design, build, and operate production Django services that orchestrate distributed ML workflows. Build high-throughput async job processing systems and implement reliability patterns. Collaborate with ML teams to productionise training and inference pipelines.

Key Highlights
Design and maintain Django services supporting ML inference workflows
Build high-throughput async job processing systems using queues and schedulers
Implement reliability patterns including retries, idempotency, rate limiting, and backpressure
Key Responsibilities
Design and maintain Django services supporting ML inference workflows
Build high-throughput async job processing systems using queues and schedulers
Implement reliability patterns including retries, idempotency, rate limiting, and backpressure
Own observability strategy including metrics, tracing, logging, and alerting
Lead incident response and drive long-term reliability improvements
Collaborate with ML teams to productionise training and inference pipelines
Support CI/CD and infrastructure automation using Infrastructure as Code
Technical Skills Required
Python Django Celery RQ Arq AWS GCP Terraform Postgres Redis
Benefits & Perks
Fully remote within CET time zone
High autonomy and strong technical ownership
Nice to Have
Experience operating ML infrastructure or MLOps platforms
Familiarity with orchestration tools (Airflow, Temporal, Prefect, Step Functions)
Experience with observability stacks such as Prometheus, Grafana, or OpenTelemetry

Job Description


⚙️ Senior Backend Engineer – ML Infrastructure & Reliability

📍 Remote (CET) | Full-Time


A high-growth AI technology company is building large-scale machine learning platforms that power content generation for global enterprise brands. Their production systems coordinate high-throughput ML inference across multiple services and external providers.

They are hiring a Senior Backend Engineer to take ownership of reliability, orchestration, and performance across their core backend platform.


💻 The Role

You will design, build, and operate production Django services that orchestrate distributed ML workflows. The focus is on building highly reliable backend systems capable of handling asynchronous processing at scale.


🛠 Key Responsibilities

• Design and maintain Django services supporting ML inference workflows

• Build high-throughput async job processing systems using queues and schedulers

• Implement reliability patterns including retries, idempotency, rate limiting, and backpressure

• Own observability strategy including metrics, tracing, logging, and alerting

• Lead incident response and drive long-term reliability improvements

• Collaborate with ML teams to productionise training and inference pipelines

• Support CI/CD and infrastructure automation using Infrastructure as Code


✅ Requirements

• Strong Python backend engineering background

• Proven experience running Django applications in production

• Experience building asynchronous processing systems (Celery, RQ, Arq or similar)

• Solid understanding of distributed systems reliability principles

• Experience with AWS or GCP cloud environments

• Practical Infrastructure as Code experience (Terraform or similar)


⭐ Nice To Have

• Experience operating ML infrastructure or MLOps platforms

• Familiarity with orchestration tools (Airflow, Temporal, Prefect, Step Functions)

• Experience with observability stacks such as Prometheus, Grafana, or OpenTelemetry

• Experience scaling Postgres or caching systems like Redis


🌟 Why Join

• Own reliability of business-critical AI production systems

• Solve complex distributed systems challenges

• Work closely with ML and backend engineering teams

• Fully remote within CET time zone

• High autonomy and strong technical ownership

📩 If you are interested, message me directly or apply via LinkedIn.


Similar Jobs

Explore other opportunities that match your interests

DevSecOps Engineer

Programming
1d ago

Premium Job

Sign up is free! Login or Sign up to view full details.

•••••• •••••• ••••••
Job Type ••••••
Experience Level ••••••

Deel

Emea

Executive Product Lead for Brain, Flow, and Link

Programming
2d ago

Premium Job

Sign up is free! Login or Sign up to view full details.

•••••• •••••• ••••••
Job Type ••••••
Experience Level ••••••

pencil

Emea

Senior Full Stack Engineer

Programming
5d ago

Premium Job

Sign up is free! Login or Sign up to view full details.

•••••• •••••• ••••••
Job Type ••••••
Experience Level ••••••

Deel

Emea

Subscribe our newsletter

New Things Will Always Update Regularly