RL Environments Engineer

xor Czechia
Remote
Apply
AI Summary

Design and build production-grade MLE/SWE environments for LLM interaction. Target specific language models while maintaining a rigorous difficulty distribution. Deliver high-quality tasks with minimal supervision.

Key Highlights
Architect Environments
Model Targeting
Rapid Delivery
Key Responsibilities
Architect Environments
Model Targeting
Rapid Delivery
Iterative Design
Technical Skills Required
Python Docker C++ Rust Scala Java
Benefits & Perks
Base Pay: $90 – $160 USD / hour
Performance Bonuses
Flexibility: 100% Remote
Growth: Clear potential path to Full-Time Employment
Nice to Have
Experience designing environments/tasks for RL and/or evaluations
Experience in high-stakes or regulated domains
ML systems experience

Job Description


XOR is exclusively hiring on behalf of an elite Silicon Valley AI startup currently operating in stealth mode.


Our partner is redefining the future of AI by building the next generation of training data. While today’s LLMs are powerful, they often struggle with real-world tasks that fall outside their training distribution. This team is solving that by creating sophisticated reinforcement learning (RL) environments that ground AI feedback in reality.


Why Join?

  • Elite Lineage: The founding team comes directly from Anthropic’s data team, having built the core data infrastructure, tokenizers, and datasets behind the Claude models.
  • Tier-1 Backing: Backed by the world’s most prestigious Silicon Valley VCs (Seed round).
  • Strategic Impact: You will work directly with top-tier AI labs, influencing the timelines and priorities of the world’s most advanced models.
  • True Innovation: This isn't about "wrapping an API"—it's about architecting the environments where the next leap in intelligence will happen.


Brief Description of the Vacancy

We’re hiring RL Environments Engineers to design and build MLE/SWE environments that deliver high-quality, diverse tasks with minimal supervision. You will target a specific language model, meet a defined difficulty distribution, and deliver about one task every 10 hours. This is a remote contractor role with ≥4 hours overlap to PST and advanced English (C1/C2) required.


Key Responsibilities:

  • Architect Environments: Design and build production-grade MLE/SWE environments for LLM interaction.
  • Model Targeting: Tailor tasks to specific language models while maintaining a rigorous difficulty distribution.
  • Rapid Delivery: Once onboarded, maintain a high-velocity output (~1 complex task per 8-10 hours).
  • Iterative Design: Refine and edit tasks within 24 hours based on customer/researcher feedback.


What we’re looking for (must-haves)

  • Strong Python (engineering-quality, not notebook-only).
  • Hands-on LLM/GenAI work in production: you’ve shipped and operated real systems (not “wrapped an API and called it AI”).
  • Strong product/engineering ownership: comfortable building, fixing, and scaling end-to-end pipelines.
  • Docker + production mindset (debugging, reliability, iteration speed).
  • ≥4 hours PST overlap and advanced English (C1/C2) for specs, reviews, and feedback.
  • Ability to meet throughput expectations and respond quickly to feedback.


Strong Signals (Nice-to-Haves):

  • Experience designing environments/tasks for RL and/or evaluations.
  • Experience in high-stakes or regulated domains (e.g., healthcare, finance, fraud/risk, safety-critical systems).
  • ML systems experience: CI/CD, monitoring, evaluation harnesses, MLOps, scalable pipelines.
  • Systems depth: C++/Rust/Scala/Java, performance/infra optimization, distributed systems.
  • Exposure to RL / bandits / agentic systems (not required, but a strong signal)


Not a fit if

  • You’re primarily a prompt engineer without strong ML/engineering foundations.
  • You’re a research-only / academic-only profile with little or no shipping/production ownership.
  • You’ve only built in notebooks or rely heavily on managed AutoML tools.


Compensation & Benefits

  • Base Pay: $90 – $160 USD / hour ($15,000 – $22,500 monthly equivalent), based on seniority and technical performance.
  • Performance Bonuses: Monthly bonuses based on task delivery and quality.
  • Flexibility: 100% Remote, 40 hours per week, with a flexible schedule.
  • Growth: A clear potential path to Full-Time Employment (FTE) and relocation for high performers.


The Hiring Process

  1. Application: Submit your CV and a brief note on your technical track.
  2. Initial Challenge: A short take-home form/task to assess baseline skills. You can also schedule a call with XOR during this stage to learn more about the client.
  3. Technical Deep Dive: An interview with the client's technical leadership.
  4. Final Coding Task: A comprehensive assignment to prove your production-ready skills.

Note: Time spent on the final take-home assignment is compensated if you receive an offer.


Similar Jobs

Explore other opportunities that match your interests

AI Engineer

Programming
1w ago

Premium Job

Sign up is free! Login or Sign up to view full details.

•••••• •••••• ••••••
Job Type ••••••
Experience Level ••••••

CloudTalk

Czechia
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Mid-Senior level

pulserise technologies

Czechia
Visa Sponsorship Relocation Remote
Job Type Contract
Experience Level Mid-Senior level

berber s.r.o.

Czechia

Subscribe our newsletter

New Things Will Always Update Regularly