Senior ML Systems Engineer

a5 labs • Apac
Remote
Apply
AI Summary

Design, build, and optimize end-to-end infrastructure for AI-driven gameplay, simulation, analytics, and security. Focus on C++ systems, machine learning infrastructure, and large-scale data pipelines. Enable fast iteration, scalable training, and reliable deployment.

Key Highlights
Design and maintain end-to-end ML/RL systems
Build and optimize high-throughput data pipelines
Profile and optimize system performance
Establish engineering standards for ML systems
Technical Skills Required
C++ Python TypeScript React Machine Learning RL Large-scale data pipelines
Benefits & Perks
Competitive salary
Performance-based bonuses
Fully remote work

Job Description


About A5 Labs

A5 Labs is a US-based, fully remote AI and gaming technology company.

We build large-scale systems behind competitive online games, focusing on AI-driven gameplay, simulation, analytics, and security (anti-cheat).

Our work sits at the intersection of C++ systems, machine learning infrastructure, and large-scale data pipelines, enabling researchers and engineers to iterate quickly and deploy models reliably in production.



Role Summary

We are looking for a senior ML systems engineer to design, build, and optimize the end-to-end infrastructure that connects:

  • C++ game environments and inference serving
  • Python-based training, evaluation, and analytics pipelines

This role is systems- and performance-oriented, not an algorithm research position.

Your impact will be enabling fast iteration, scalable training, and reliable deployment across the entire ML/RL stack.



What You’ll Do

  • Design and maintain end-to-end ML/RL systems spanning:
  • C++ game environments and real-time inference serving
  • Python-based training, evaluation, and data pipelines
  • Build and optimize high-throughput data pipelines that serve:
  • RL training
  • evaluation and visualization
  • gameplay analytics
  • anti-cheat models and research workflows
  • Profile and optimize system performance (latency, throughput, memory, GPU utilization) across C++ and Python components.
  • Design infrastructure that enables fast iteration from small-scale experiments to large-scale training and production.
  • Improve observability by building tools for:
  • logging and metrics
  • replaying problem cases
  • debugging model and system behavior
  • Establish engineering standards for ML systems:
  • testing
  • CI/CD
  • coding and operational guidelines



Ideal Experience

  • Strong experience with C++ systems, including implementation, profiling, and optimization.
  • Experience building ML training and evaluation pipelines in Python.
  • Experience designing systems that bridge data generation (env / serving) and model training.
  • Solid understanding of ML/RL workflows (training, evaluation, inference); algorithm research is not the focus.
  • Experience with production ML systems, CI/CD, and test automation.
  • Bonus:
  • experience with simulation, game engines, or real-time inference systems
  • experience supporting researchers with scalable infrastructure



Typical Problems You’ll Work On

  • How do we design a system where a C++ game environment efficiently feeds data to training, evaluation, analytics, and anti-cheat pipelines?
  • How do we profile and optimize the end-to-end loop from environment → data → training → inference?
  • How do we enable researchers to explore new ideas without heavy infrastructure overhead?
  • How do we build observability tools that expose both model behavior and system bottlenecks?
  • How do we ensure fast, reliable, and scalable C++ inference serving in production?



Tech Stack

  • C++ (game environments, inference serving)
  • Python (training, evaluation, data pipelines)
  • TypeScript / React (internal tools and visualization)



Location & Compensation

  • Fully remote
  • Competitive salary + performance-based bonuses

Subscribe our newsletter

New Things Will Always Update Regularly