Senior Software Engineer for AI Model Evaluation

Alignerr • United State

Remote

Apply

AI Summary

Evaluate AI models on complex software engineering tasks, identify bugs and edge cases, and provide precise feedback. 3+ years of software engineering experience required. Strong proficiency in at least one programming language.

Key Highlights

Evaluate AI models on complex software engineering tasks

Identify bugs and edge cases

Provide precise feedback

Key Responsibilities

Evaluate frontier AI models on complex software engineering tasks

Hunt for bugs, logical errors, hallucinations, and reliability issues in AI-generated code

Design and review prompts, test cases, and evaluation scenarios that expose model weaknesses

Technical Skills Required

TypeScript Ruby Java C++

Benefits & Perks

Fully remote work

Flexible contract hours

Potential for ongoing work and contract extension

Nice to Have

Experience across multiple programming languages or paradigms

Background in code review, QA engineering, or technical writing

Job Description

About The Role

What if your engineering expertise could directly shape how the next generation of AI writes, reasons about, and debugs code? We're looking for experienced software engineers to evaluate cutting-edge AI models on complex, real-world coding tasks — finding the failure modes, hallucinations, and edge cases that only a seasoned engineer would catch.

This is a fully remote, flexible contract role built for engineers who think critically, debug instinctively, and know what good code actually looks like.

Organization: Alignerr
Type: Hourly Contract
Location: Remote
Commitment: 10–40 hours/week

What You'll Do

Evaluate frontier AI models on complex software engineering tasks — from algorithm design to system architecture
Hunt for bugs, logical errors, hallucinations, and reliability issues in AI-generated code
Design and review prompts, test cases, and evaluation scenarios that expose model weaknesses

Interested in remote work opportunities in Development & Programming? Discover Development & Programming Remote Jobs featuring exclusive positions from top companies that offer flexible work arrangements.

Write precise, structured feedback documenting model strengths, failure modes, and edge cases
Work across multiple languages and codebases to assess how well AI generalizes across real engineering contexts
Think like a senior reviewer — not a user — and push models beyond surface-level correctness

Who You Are

3+ years of professional software engineering experience
Strong proficiency in at least one of: TypeScript, Ruby, Java, or C++
Excellent written and spoken English — you communicate complex technical reasoning clearly
Sharp debugging instincts — you notice when something is subtly wrong, not just obviously broken
Familiar with modern development workflows: Git, CLI tooling, testing frameworks, and IDEs
Able to critically evaluate AI output rather than simply accept it at face value

Browse our curated collection of remote jobs across all categories and industries, featuring positions from top companies worldwide.

Nice to Have

Experience across multiple programming languages or paradigms
Background in code review, QA engineering, or technical writing
Prior exposure to LLMs, AI evaluation, or prompt engineering workflows
Comfort working with ambiguous tasks and defining your own evaluation criteria

Why Join Us

Work on frontier AI projects alongside leading research labs
Fully remote and flexible — set your own hours and work from anywhere
Freelance autonomy with the structure of meaningful, task-based engineering work
Make a direct, tangible impact on how AI understands and produces real-world code
Potential for ongoing work and contract extension as new projects launch

Job Overview

Posted Date May 02, 2026

Employment Type Contract

Experience Level Mid-Senior level

Location United State

Category Programming

Company Alignerr

Mentioned Skills

Industries

Similar Jobs

Explore other opportunities that match your interests

Technical Sourcer, AI/ML Engineering

Programming

•

4h ago

Premium Job

•••••• •••••• ••••••

Job Type ••••••

Experience Level ••••••

wynd labs

United State

Senior Director, Enterprise Measurement

Programming

•

4h ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Director

Nutrafol

United State

Manager of Observability Architects

Programming

•

4h ago

Premium Job

•••••• •••••• ••••••

Job Type ••••••

Experience Level ••••••

Grafana Labs

United State

Senior Software Engineer for AI Model Evaluation

Key Highlights

Key Responsibilities

Technical Skills Required

Benefits & Perks

Nice to Have

Job Description

Job Overview

Mentioned Skills

Industries

Similar Jobs

Technical Sourcer, AI/ML Engineering

Premium Job

wynd labs

Senior Director, Enterprise Measurement

Nutrafol

Manager of Observability Architects

Premium Job

Grafana Labs

Subscribe our newsletter