AI System Output Evaluator

rex.zone • Brazil
Remote
Apply
AI Summary

Join Rex.zone to evaluate and improve AI system outputs using structured human-in-the-loop feedback. You will support RLHF workflows, prompt evaluation, and QA evaluation for large language models. This role requires strong technical skills and experience in engineering, applied ML, QA evaluation, or data operations.

Key Highlights
Design and run LLM evaluation tasks
Support RLHF-style preference data collection and analysis
Audit data labeling outputs and QA evaluation results
Key Responsibilities
Design and run LLM evaluation tasks (pairwise ranking, rubric-based grading, error categorization)
Support RLHF-style preference data collection and analysis
Audit data labeling outputs and QA evaluation results for annotation guidelines compliance
Perform prompt evaluation and regression testing across model versions
Track quality metrics and reviewer calibration using spreadsheets and/or scripting
Document failure modes and coordinate with stakeholders to resolve systematic issues
Technical Skills Required
Python SQL Spreadsheets Rubrics Preference data RLHF Large language models NLP evaluation Computer vision/multimodal annotation review
Benefits & Perks
Remote work
Work with distributed teams
Production-grade LLM training pipelines and disciplined QA evaluation processes

Job Description


About The Role

Join Rex.zone to evaluate and improve AI system outputs using structured human-in-the-loop feedback. You will support RLHF workflows, prompt evaluation, and QA evaluation for large language models, ensuring training data quality and measurable model performance improvement across domains and languages.

What You Will Do

  • Design and run LLM evaluation tasks (pairwise ranking, rubric-based grading, error categorization)
  • Support RLHF-style preference data collection and analysis
  • Audit data labeling outputs and QA evaluation results for annotation guidelines compliance
  • Perform prompt evaluation and regression testing across model versions
  • Track quality metrics and reviewer calibration using spreadsheets and/or scripting
  • Document failure modes and coordinate with stakeholders to resolve systematic issues

Core Workflows

  • LLM training pipelines and evaluation harnesses
  • Gold sets, calibration sessions, blind audits, and inter-annotator agreement checks
  • NLP evaluation (classification, NER, summarization) and content safety labeling
  • Computer vision/multimodal annotation review when needed

Required Qualifications

  • Mid-Senior experience in engineering, applied ML, QA evaluation, or data operations
  • Strong understanding of rubrics, preference data, and RLHF concepts
  • Ability to write clear annotation guidelines and drive calibration
  • Proficiency with spreadsheets and/or scripting (Python or SQL preferred)
  • Strong written communication for documenting edge cases and acceptance criteria

Why Rex.zone

Work remotely with distributed teams supporting production-grade LLM training pipelines and disciplined QA evaluation processes that deliver measurable outcomes.

Similar Jobs

Explore other opportunities that match your interests

Senior/Staff AI/ML Engineer

Machine Learning
•
1w ago
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Mid-Senior level

emma of torre.ai

Brazil

Senior PyTorch Software Engineer

Machine Learning
•
1w ago

Premium Job

Sign up is free! Login or Sign up to view full details.

•••••• •••••• ••••••
Job Type ••••••
Experience Level ••••••

BairesDev

Brazil

Software Engineer (PyTorch) - Deep Learning Models and Pipelines

Machine Learning
•
3w ago

Premium Job

Sign up is free! Login or Sign up to view full details.

•••••• •••••• ••••••
Job Type ••••••
Experience Level ••••••

BairesDev

Brazil

Subscribe our newsletter

New Things Will Always Update Regularly