Senior Python Engineer for LLM Evaluation

hackajob โ€ข India
Remote
Apply
AI Summary

Senior Python Engineer for part-time, task-based work focused on evaluating and testing Large Language Models (LLMs). Evaluate failure modes, design test cases, and analyze LLM behavior. Requires Python expertise, strong engineering judgment, and comfort with debugging code.

Key Highlights
Evaluating Large Language Models (LLMs)
Designing structured test cases
Analyzing LLM failure modes
Working directly with Git repositories and existing codebases
Technical Skills Required
Python Git Structured testing
Benefits & Perks
Part-time, remote work
Flexible schedule

Job Description


Senior Python Engineer / LLM Evaluation โ€“ Part-Time, Remote


We are hiring experienced Python engineers for part-time, task-based work focused on evaluating and testing Large Language Models (LLMs).


This is not traditional QA and not junior AI labeling work. We are looking for senior engineers who can reason deeply about system behavior, ambiguity, and real-world usage โ€” not just write code.


What Youโ€™ll Do


โ€ข Design structured test cases that simulate real human workflows

โ€ข Define gold-standard outputs and expected behaviors

โ€ข Analyze LLM failure modes such as hallucinations, bias, and context limitations

โ€ข Work directly with Git repositories and existing codebases

โ€ข Navigate incomplete documentation and ambiguous requirements

โ€ข Apply engineering judgment to determine what โ€œgoodโ€ looks like


Who You Are


โ€ข 3+ years of software development experience (Python-focused)

โ€ข Python is your primary language

โ€ข Strong hands-on Git experience in real projects

โ€ข Comfortable reading and debugging code you didnโ€™t write

โ€ข Able to reason about edge cases, trade-offs, and ambiguity

โ€ข Strong written and spoken English (B2+)


Nice to Have


โ€ข QA or structured testing experience (must be code-capable)

โ€ข Experience evaluating AI or LLM systems

โ€ข Familiarity with evaluation metrics such as precision, recall, coverage

โ€ข Experience working with Docker

โ€ข Consulting or freelance engineering background


What Weโ€™re Looking For


We value engineers who can explain why something fails โ€” not just that it fails. If you naturally think in terms of scenarios, assertions, failure modes, and user expectations, youโ€™ll thrive here.


This role suits senior backend Python engineers, ML engineers who still code regularly, and technically strong evaluators with real production experience.


Fully remote. Flexible schedule. Task-based delivery.


If youโ€™re interested in applying your engineering judgment to real-world AI system evaluation, weโ€™d love to hear from you.


Similar Jobs

Explore other opportunities that match your interests

Senior Software Engineer - Contribute to AI-assisted Software Development

Programming
โ€ข
1h ago

Premium Job

Sign up is free! Login or Sign up to view full details.

โ€ขโ€ขโ€ขโ€ขโ€ขโ€ข โ€ขโ€ขโ€ขโ€ขโ€ขโ€ข โ€ขโ€ขโ€ขโ€ขโ€ขโ€ข
Job Type โ€ขโ€ขโ€ขโ€ขโ€ขโ€ข
Experience Level โ€ขโ€ขโ€ขโ€ขโ€ขโ€ข

Turing

India

Conversational AI Manager

Programming
โ€ข
6h ago
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Mid-Senior level

full potential solutions

India

Senior Backend Engineer

Programming
โ€ข
6h ago

Premium Job

Sign up is free! Login or Sign up to view full details.

โ€ขโ€ขโ€ขโ€ขโ€ขโ€ข โ€ขโ€ขโ€ขโ€ขโ€ขโ€ข โ€ขโ€ขโ€ขโ€ขโ€ขโ€ข
Job Type โ€ขโ€ขโ€ขโ€ขโ€ขโ€ข
Experience Level โ€ขโ€ขโ€ขโ€ขโ€ขโ€ข

Emerson

India

Subscribe our newsletter

New Things Will Always Update Regularly