Join Rex.zone to evaluate and improve AI system outputs using structured human-in-the-loop feedback. You will support RLHF workflows, prompt evaluation, and QA evaluation for large language models. This role requires strong technical skills and experience in engineering, applied ML, QA evaluation, or data operations.
Key Highlights
Key Responsibilities
Technical Skills Required
Benefits & Perks
Job Description
About The Role
Join Rex.zone to evaluate and improve AI system outputs using structured human-in-the-loop feedback. You will support RLHF workflows, prompt evaluation, and QA evaluation for large language models, ensuring training data quality and measurable model performance improvement across domains and languages.
What You Will Do
- Design and run LLM evaluation tasks (pairwise ranking, rubric-based grading, error categorization)
- Support RLHF-style preference data collection and analysis
- Audit data labeling outputs and QA evaluation results for annotation guidelines compliance
- Perform prompt evaluation and regression testing across model versions
- Track quality metrics and reviewer calibration using spreadsheets and/or scripting
- Document failure modes and coordinate with stakeholders to resolve systematic issues
Interested in remote work opportunities in Machine Learning & AI? Discover Machine Learning & AI Remote Jobs featuring exclusive positions from top companies that offer flexible work arrangements.
- LLM training pipelines and evaluation harnesses
- Gold sets, calibration sessions, blind audits, and inter-annotator agreement checks
- NLP evaluation (classification, NER, summarization) and content safety labeling
- Computer vision/multimodal annotation review when needed
Browse our curated collection of remote jobs across all categories and industries, featuring positions from top companies worldwide.
- Mid-Senior experience in engineering, applied ML, QA evaluation, or data operations
- Strong understanding of rubrics, preference data, and RLHF concepts
- Ability to write clear annotation guidelines and drive calibration
- Proficiency with spreadsheets and/or scripting (Python or SQL preferred)
- Strong written communication for documenting edge cases and acceptance criteria
Work remotely with distributed teams supporting production-grade LLM training pipelines and disciplined QA evaluation processes that deliver measurable outcomes.
Similar Jobs
Explore other opportunities that match your interests
emma of torre.ai
Senior PyTorch Software Engineer
BairesDev
Software Engineer (PyTorch) - Deep Learning Models and Pipelines