AI System Output Evaluator

rex.zone • Brazil

Remote

Apply

AI Summary

Join Rex.zone to evaluate and improve AI system outputs using structured human-in-the-loop feedback. You will support RLHF workflows, prompt evaluation, and QA evaluation for large language models. This role requires strong technical skills and experience in engineering, applied ML, QA evaluation, or data operations.

Key Highlights

Design and run LLM evaluation tasks

Support RLHF-style preference data collection and analysis

Audit data labeling outputs and QA evaluation results

Key Responsibilities

Design and run LLM evaluation tasks (pairwise ranking, rubric-based grading, error categorization)

Support RLHF-style preference data collection and analysis

Audit data labeling outputs and QA evaluation results for annotation guidelines compliance

Perform prompt evaluation and regression testing across model versions

Track quality metrics and reviewer calibration using spreadsheets and/or scripting

Document failure modes and coordinate with stakeholders to resolve systematic issues

Technical Skills Required

Python SQL Spreadsheets Rubrics Preference data RLHF Large language models NLP evaluation Computer vision/multimodal annotation review

Benefits & Perks

Remote work

Work with distributed teams

Production-grade LLM training pipelines and disciplined QA evaluation processes

Job Description

About The Role

Join Rex.zone to evaluate and improve AI system outputs using structured human-in-the-loop feedback. You will support RLHF workflows, prompt evaluation, and QA evaluation for large language models, ensuring training data quality and measurable model performance improvement across domains and languages.

What You Will Do

Design and run LLM evaluation tasks (pairwise ranking, rubric-based grading, error categorization)
Support RLHF-style preference data collection and analysis
Audit data labeling outputs and QA evaluation results for annotation guidelines compliance
Perform prompt evaluation and regression testing across model versions
Track quality metrics and reviewer calibration using spreadsheets and/or scripting
Document failure modes and coordinate with stakeholders to resolve systematic issues

Interested in remote work opportunities in Machine Learning & AI? Discover Machine Learning & AI Remote Jobs featuring exclusive positions from top companies that offer flexible work arrangements.

Core Workflows

LLM training pipelines and evaluation harnesses
Gold sets, calibration sessions, blind audits, and inter-annotator agreement checks
NLP evaluation (classification, NER, summarization) and content safety labeling
Computer vision/multimodal annotation review when needed

Required Qualifications

Browse our curated collection of remote jobs across all categories and industries, featuring positions from top companies worldwide.

Mid-Senior experience in engineering, applied ML, QA evaluation, or data operations
Strong understanding of rubrics, preference data, and RLHF concepts
Ability to write clear annotation guidelines and drive calibration
Proficiency with spreadsheets and/or scripting (Python or SQL preferred)
Strong written communication for documenting edge cases and acceptance criteria

Why Rex.zone

Work remotely with distributed teams supporting production-grade LLM training pipelines and disciplined QA evaluation processes that deliver measurable outcomes.

Job Overview

Posted Date May 04, 2026

Employment Type Full-time

Experience Level Mid-Senior level

Location Brazil

Category Machine Learning

Company rex.zone

Mentioned Skills

Industries

Similar Jobs

Explore other opportunities that match your interests

Senior/Staff AI/ML Engineer

Machine Learning

•

1w ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Mid-Senior level

emma of torre.ai

Brazil

Senior PyTorch Software Engineer

Machine Learning

•

1w ago

Premium Job

•••••• •••••• ••••••

Job Type ••••••

Experience Level ••••••

BairesDev

Brazil

Software Engineer (PyTorch) - Deep Learning Models and Pipelines

Machine Learning

•

3w ago

Premium Job

•••••• •••••• ••••••

Job Type ••••••

Experience Level ••••••

BairesDev

Brazil

AI System Output Evaluator

Key Highlights

Key Responsibilities

Technical Skills Required

Benefits & Perks

Job Description

Job Overview

Mentioned Skills

Industries

Similar Jobs

Senior/Staff AI/ML Engineer

emma of torre.ai

Senior PyTorch Software Engineer

Premium Job

BairesDev

Software Engineer (PyTorch) - Deep Learning Models and Pipelines

Premium Job

BairesDev

Subscribe our newsletter