Senior Backend Engineer for ML Infrastructure and Reliability

Remote
Apply
AI Summary

Design, build, and operate production-grade Django services for high-throughput ML inference. Ensure reliability, observability, and performance. Collaborate with ML and backend engineers.

Key Highlights
Design and operate production-grade Django services
Ensure reliability, observability, and performance
Collaborate with ML and backend engineers
Key Responsibilities
Build and maintain Django services for ML inference workflows
Implement asynchronous execution with queues, workers, and schedulers
Ensure reliability: idempotency, retries, rate limiting, backpressure
Define and operate SLOs/SLAs; lead incident response and postmortems
Implement end-to-end observability: metrics, logs, traces, dashboards, alerts
Collaborate with ML engineers to productionize pipelines
Support infrastructure with Terraform and CI/CD
Technical Skills Required
Python Django Linux Networking Cloud platforms (AWS/GCP) Terraform CI/CD
Benefits & Perks
Full-time
100% remote
High-ownership over core backend infrastructure
Nice to Have
Operated ML infrastructure at scale or worked with MLOps tooling

Job Description


Senior Backend Engineer, ML Infrastructure & Reliability


A rare opportunity has opened for an experienced Senior Backend Engineer to own and scale backend systems powering high-throughput ML inference for a growing AI platform.


In this role, you will design, build, and operate production-grade Django services that orchestrate ML workflows across internal systems and external providers. You will take end-to-end ownership of reliability, observability, and performance, ensuring that the platform scales safely as usage grows.


If you enjoy tackling complex reliability and orchestration challenges and building backend systems that must perform at scale…

Feel invited — this role offers real ownership and technical impact.


WHAT WE OFFER

  • Full-time, B2B
  • 100% remote
  • High-ownership over core backend infrastructure
  • Collaboration with ML and backend engineers
  • Exposure to high-throughput, distributed ML workflows
  • Pragmatic, product-driven engineering culture


YOUR ROLE

  • Build and maintain Django services for ML inference workflows
  • Implement asynchronous execution with queues, workers, and schedulers
  • Ensure reliability: idempotency, retries, rate limiting, backpressure
  • Define and operate SLOs/SLAs; lead incident response and postmortems
  • Implement end-to-end observability: metrics, logs, traces, dashboards, alerts
  • Collaborate with ML engineers to productionize pipelines
  • Support infrastructure with Terraform and CI/CD


IF YOU ARE A PERSON WHO

  • Has strong experience as a Python backend engineer with production ownership
  • Has hands-on experience running Django in production (ORM, migrations, request lifecycle, performance tuning)
  • Has built and operated asynchronous job systems
  • Understands distributed system reliability and orchestration patterns
  • Knows Linux, networking, and cloud platforms (AWS/GCP)
  • Has practical experience with Infrastructure as Code
  • Has operated ML infrastructure at scale or worked with MLOps tooling (nice to have)
  • Thrives in high-ownership, fast-paced environments


Congrats! This role is ideal for YOU!


Similar Jobs

Explore other opportunities that match your interests

Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Mid-Senior level

Jamf

Poland
Visa Sponsorship Relocation Remote
Job Type Contract
Experience Level Mid-Senior level

infolet

Poland
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Mid-Senior level

victoriametrics

Poland

Subscribe our newsletter

New Things Will Always Update Regularly