Senior Software Engineer for AI Model Evaluation

Alignerr United State
Remote
Apply
AI Summary

Evaluate AI models on complex software engineering tasks, identify bugs and edge cases, and provide precise feedback. 3+ years of software engineering experience required. Strong proficiency in at least one programming language.

Key Highlights
Evaluate AI models on complex software engineering tasks
Identify bugs and edge cases
Provide precise feedback
Key Responsibilities
Evaluate frontier AI models on complex software engineering tasks
Hunt for bugs, logical errors, hallucinations, and reliability issues in AI-generated code
Design and review prompts, test cases, and evaluation scenarios that expose model weaknesses
Technical Skills Required
TypeScript Ruby Java C++
Benefits & Perks
Fully remote work
Flexible contract hours
Potential for ongoing work and contract extension
Nice to Have
Experience across multiple programming languages or paradigms
Background in code review, QA engineering, or technical writing

Job Description


About The Role

What if your engineering expertise could directly shape how the next generation of AI writes, reasons about, and debugs code? We're looking for experienced software engineers to evaluate cutting-edge AI models on complex, real-world coding tasks — finding the failure modes, hallucinations, and edge cases that only a seasoned engineer would catch.

This is a fully remote, flexible contract role built for engineers who think critically, debug instinctively, and know what good code actually looks like.

  • Organization: Alignerr
  • Type: Hourly Contract
  • Location: Remote
  • Commitment: 10–40 hours/week

What You'll Do

  • Evaluate frontier AI models on complex software engineering tasks — from algorithm design to system architecture
  • Hunt for bugs, logical errors, hallucinations, and reliability issues in AI-generated code
  • Design and review prompts, test cases, and evaluation scenarios that expose model weaknesses
  • Write precise, structured feedback documenting model strengths, failure modes, and edge cases
  • Work across multiple languages and codebases to assess how well AI generalizes across real engineering contexts
  • Think like a senior reviewer — not a user — and push models beyond surface-level correctness

Who You Are

  • 3+ years of professional software engineering experience
  • Strong proficiency in at least one of: TypeScript, Ruby, Java, or C++
  • Excellent written and spoken English — you communicate complex technical reasoning clearly
  • Sharp debugging instincts — you notice when something is subtly wrong, not just obviously broken
  • Familiar with modern development workflows: Git, CLI tooling, testing frameworks, and IDEs
  • Able to critically evaluate AI output rather than simply accept it at face value

Nice to Have

  • Experience across multiple programming languages or paradigms
  • Background in code review, QA engineering, or technical writing
  • Prior exposure to LLMs, AI evaluation, or prompt engineering workflows
  • Comfort working with ambiguous tasks and defining your own evaluation criteria

Why Join Us

  • Work on frontier AI projects alongside leading research labs
  • Fully remote and flexible — set your own hours and work from anywhere
  • Freelance autonomy with the structure of meaningful, task-based engineering work
  • Make a direct, tangible impact on how AI understands and produces real-world code
  • Potential for ongoing work and contract extension as new projects launch


Similar Jobs

Explore other opportunities that match your interests

Technical Sourcer, AI/ML Engineering

Programming
4h ago

Premium Job

Sign up is free! Login or Sign up to view full details.

•••••• •••••• ••••••
Job Type ••••••
Experience Level ••••••

wynd labs

United State
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Director

Nutrafol

United State

Manager of Observability Architects

Programming
4h ago

Premium Job

Sign up is free! Login or Sign up to view full details.

•••••• •••••• ••••••
Job Type ••••••
Experience Level ••••••

Grafana Labs

United State

Subscribe our newsletter

New Things Will Always Update Regularly