RL Environments Engineer

xor • Czechia

Remote

Apply

AI Summary

Design and build production-grade MLE/SWE environments for LLM interaction. Target specific language models while maintaining a rigorous difficulty distribution. Deliver high-quality tasks with minimal supervision.

Key Highlights

Architect Environments

Model Targeting

Rapid Delivery

Key Responsibilities

Architect Environments

Model Targeting

Rapid Delivery

Iterative Design

Technical Skills Required

Python Docker C++ Rust Scala Java

Benefits & Perks

Base Pay: $90 – $160 USD / hour

Performance Bonuses

Flexibility: 100% Remote

Growth: Clear potential path to Full-Time Employment

Nice to Have

Experience designing environments/tasks for RL and/or evaluations

Experience in high-stakes or regulated domains

ML systems experience

Job Description

XOR is exclusively hiring on behalf of an elite Silicon Valley AI startup currently operating in stealth mode.

Our partner is redefining the future of AI by building the next generation of training data. While today’s LLMs are powerful, they often struggle with real-world tasks that fall outside their training distribution. This team is solving that by creating sophisticated reinforcement learning (RL) environments that ground AI feedback in reality.

Why Join?

Elite Lineage: The founding team comes directly from Anthropic’s data team, having built the core data infrastructure, tokenizers, and datasets behind the Claude models.
Tier-1 Backing: Backed by the world’s most prestigious Silicon Valley VCs (Seed round).
Strategic Impact: You will work directly with top-tier AI labs, influencing the timelines and priorities of the world’s most advanced models.
True Innovation: This isn't about "wrapping an API"—it's about architecting the environments where the next leap in intelligence will happen.

Brief Description of the Vacancy

We’re hiring RL Environments Engineers to design and build MLE/SWE environments that deliver high-quality, diverse tasks with minimal supervision. You will target a specific language model, meet a defined difficulty distribution, and deliver about one task every 10 hours. This is a remote contractor role with ≥4 hours overlap to PST and advanced English (C1/C2) required.

Key Responsibilities:

Architect Environments: Design and build production-grade MLE/SWE environments for LLM interaction.
Model Targeting: Tailor tasks to specific language models while maintaining a rigorous difficulty distribution.
Rapid Delivery: Once onboarded, maintain a high-velocity output (~1 complex task per 8-10 hours).
Iterative Design: Refine and edit tasks within 24 hours based on customer/researcher feedback.

Interested in remote work opportunities in Development & Programming? Discover Development & Programming Remote Jobs featuring exclusive positions from top companies that offer flexible work arrangements.

What we’re looking for (must-haves)

Strong Python (engineering-quality, not notebook-only).
Hands-on LLM/GenAI work in production: you’ve shipped and operated real systems (not “wrapped an API and called it AI”).
Strong product/engineering ownership: comfortable building, fixing, and scaling end-to-end pipelines.
Docker + production mindset (debugging, reliability, iteration speed).
≥4 hours PST overlap and advanced English (C1/C2) for specs, reviews, and feedback.
Ability to meet throughput expectations and respond quickly to feedback.

Strong Signals (Nice-to-Haves):

Experience designing environments/tasks for RL and/or evaluations.
Experience in high-stakes or regulated domains (e.g., healthcare, finance, fraud/risk, safety-critical systems).
ML systems experience: CI/CD, monitoring, evaluation harnesses, MLOps, scalable pipelines.
Systems depth: C++/Rust/Scala/Java, performance/infra optimization, distributed systems.
Exposure to RL / bandits / agentic systems (not required, but a strong signal)

Not a fit if

Browse our curated collection of remote jobs across all categories and industries, featuring positions from top companies worldwide.

You’re primarily a prompt engineer without strong ML/engineering foundations.
You’re a research-only / academic-only profile with little or no shipping/production ownership.
You’ve only built in notebooks or rely heavily on managed AutoML tools.

Compensation & Benefits

Base Pay: $90 – $160 USD / hour ($15,000 – $22,500 monthly equivalent), based on seniority and technical performance.
Performance Bonuses: Monthly bonuses based on task delivery and quality.
Flexibility: 100% Remote, 40 hours per week, with a flexible schedule.
Growth: A clear potential path to Full-Time Employment (FTE) and relocation for high performers.

The Hiring Process

Application: Submit your CV and a brief note on your technical track.
Initial Challenge: A short take-home form/task to assess baseline skills. You can also schedule a call with XOR during this stage to learn more about the client.
Technical Deep Dive: An interview with the client's technical leadership.
Final Coding Task: A comprehensive assignment to prove your production-ready skills.

Note: Time spent on the final take-home assignment is compensated if you receive an offer.

Job Overview

Posted Date Mar 09, 2026

Employment Type Full-time

Experience Level Entry level

Location Czechia

Annual Salary 222,000 USD

Category Programming

Company xor

Mentioned Skills

Similar Jobs

Explore other opportunities that match your interests

AI Engineer

Programming

•

1w ago

Premium Job

•••••• •••••• ••••••

Job Type ••••••

Experience Level ••••••

CloudTalk

Czechia

Senior C# .NET Developer for Legacy System Migration

Programming

•

2w ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Mid-Senior level

pulserise technologies

Czechia

Scala Developer for Financial Advisory and Insurance Project

Programming

•

3w ago

Visa Sponsorship Relocation Remote

Job Type Contract

Experience Level Mid-Senior level

berber s.r.o.

Czechia

RL Environments Engineer

Key Highlights

Key Responsibilities

Technical Skills Required

Benefits & Perks

Nice to Have

Job Description

Job Overview

Mentioned Skills

Industries

Similar Jobs

AI Engineer

Premium Job

CloudTalk

Senior C# .NET Developer for Legacy System Migration

pulserise technologies

Scala Developer for Financial Advisory and Insurance Project

berber s.r.o.

Subscribe our newsletter