ML Engineer - Reinforcement Learning (Data Center Cooling)

wave recruitment • United Kingdom

Visa Sponsorship

Apply

AI Summary

Build and deploy reinforcement learning agents to control data center cooling systems. Design reward functions and constraints that work within physical limits and SLAs. Work across research and deployment to create stable solutions for real-world infrastructure.

Key Highlights

Develop RL agents for cooling control in live data centers

Work with physics-based simulators and digital twins

Deploy models on-prem at the edge with federated training

Key Responsibilities

Train and deploy deep RL agents for live cooling control

Design reward functions and constraints that hold up against physical limits and SLAs

Move between research-style exploration and engineering work for real site deployment

Build and improve physics-based simulators, surrogate models, and digital twins

Close the gap between simulation and real hardware performance

Implement federated and distributed training across sites

Manage edge deployment, monitoring, and retraining of agents in production

Technical Skills Required

Python PyTorch JAX Gymnasium Reinforcement Learning Deep RL Federated Learning Distributed Training Edge ML Deployment Physics-based Simulators Digital Twins

Benefits & Perks

£110K-£150K salary

Competitive equity

Hybrid working (1 day/week in Kings Cross)

Visa sponsorship available

Nice to Have

Control systems (classical control, MPC)

HVAC, thermodynamics, power systems, or data centre operations

Federated learning, distributed training, or edge ML deployment

Simulation experience - building or using physics-based simulators

Digital twins, surrogate models, or large physics models

Published research or open-source contributions

Job Description

ML Engineer - Reinforcement Learning London (hybrid, 1 day/week in Kings Cross)- Solve Data Centres Cooling issues

Cooling is one of the largest items on a data centre's energy bill, and most sites run it conservatively because getting it wrong puts the hardware at risk. Our client trains reinforcement learning agents to control cooling systems on live sites, cutting cooling energy without breaching the temperature and humidity limits operators are contractually bound to.

They're hiring an ML Engineer - Reinforcement Learning to build those agents and get them running on real data centres. You'll report to the CTO / Head of AI and work across the line between research and deployment.

The System

The agents don't learn on the live plant. They train against a digital twin of each site, then move to production once they're safe.

Reward and constraint design is shaped by ASHRAE standards and customer SLAs - air temperature, humidity, and rate-of-change limits on cooling air and chilled water setpoints
Training is federated across multiple sites. Agents share learned control strategies without any site's operational data leaving the building, which delivers significantly more savings than a single-site approach
Models are deployed on-prem at the edge, then monitored and retrained in place

What You'll Own

Searching for Machine Learning & AI roles that provide visa sponsorship? Connect with international employers through Machine Learning & AI Jobs with Visa Sponsorship opportunities actively seeking talented professionals.

Reinforcement Learning

Train and deploy deep RL agents for live cooling control
Design reward functions and constraints that hold up against physical limits and SLAs, not just in a notebook
Move between research-style exploration and the engineering work to make something stable on a real site

Simulation and Digital Twins

Build and improve the physics-based simulators, surrogate models, and digital twins the agents train against
Close the gap between what works in simulation and what holds on real hardware

Production and Deployment

Federated and distributed training across sites
Edge deployment, monitoring, and retraining of agents already running in production

Explore our comprehensive directory of visa sponsorship jobs from employers worldwide who are ready to sponsor talented international professionals.

What We're Looking For

Essential

3-5 years training and deploying deep RL agents in Python
PyTorch or JAX, and RL libraries such as Gymnasium
A background in physical systems - engineering (mechanical, electrical, structural, biomedical), physics, robotics, autonomous driving, or control systems - and the instinct to reason about what's physically possible, not only what's mathematically possible
Comfortable iterating between research exploration and the engineering needed to run on a live site
A degree in engineering, CS, or physics

Useful

Control systems (classical control, MPC), or HVAC, thermodynamics, power systems, or data centre operations
Federated learning, distributed training, or edge ML deployment
Simulation experience - building or using physics-based simulators, digital twins, surrogate models, or large physics models
Published research or open-source contributions

Interested in opportunities specifically in United Kingdom? Discover our dedicated Visa Sponsorship Jobs in United Kingdom page featuring roles from top employers in this location.

Who You Are

You want both halves of this job. You'll run experiments and read papers, but you also want your work controlling real equipment, with the constraints that come with that. RL experience limited to advertising or multi-armed bandits won't carry over here - the physical world doesn't behave like a recommendation system. A pure maths or CS background with no feel for physical systems will struggle, and so will anyone after a pure research seat or a pure production one.

This sits in the middle.

What's On Offer

£110K-£150K, plus competitive equity
A genuine technical problem: RL on physical systems, under real constraints, deployed on live infrastructure
Direct access to the CTO and founding team
Hybrid working, one day a week in the Kings Cross office
Visa sponsorship available on a case-by-case basis

Get in touch for a confidential conversation. Imogen@waverecruitment.co.uk

Job Overview

Posted Date May 31, 2026

Employment Type Full-time

Experience Level Mid-Senior level

Location United Kingdom

Annual Salary 110,000 - 150,000 GBP

Category Machine Learning

Company wave recruitment

Mentioned Skills

Similar Jobs

Explore other opportunities that match your interests

Machine Learning Engineer, NLP | InsurTech, AI, Scale-up

Machine Learning

•

6d ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Mid-Senior level

owen thomas | b corp™

United Kingdom

Founding AI Product Engineer

Machine Learning

•

6d ago

Premium Job

•••••• •••••• ••••••

Job Type ••••••

Experience Level ••••••

intelix.ai

United Kingdom

Machine Learning Manager - Financial Crime and Fraud Team

Machine Learning

•

6d ago

Premium Job

•••••• •••••• ••••••

Job Type ••••••

Experience Level ••••••

monzo

United Kingdom

ML Engineer - Reinforcement Learning (Data Center Cooling)

Key Highlights

Key Responsibilities

Technical Skills Required

Benefits & Perks

Nice to Have

Job Description

Job Overview

Mentioned Skills

Industries

Similar Jobs

Machine Learning Engineer, NLP | InsurTech, AI, Scale-up

owen thomas | b corp™

Founding AI Product Engineer

Premium Job

intelix.ai

Machine Learning Manager - Financial Crime and Fraud Team

Premium Job

monzo

Subscribe our newsletter