AI Engineer - AI Agent Evaluation and Benchmarking

dusker ai • India

Remote

Apply

AI Summary

Design, develop, and evaluate AI models and intelligent agents with a focus on reasoning, robustness, and safety. Build evaluation frameworks and tools to help organizations deploy reliable AI systems at scale. Requires strong CS foundation, Python proficiency, and experience with neural networks and NLP.

Key Highlights

Design and develop AI models and intelligent agents for real-world performance

Build evaluation frameworks and tools for AI benchmarking and testing workflows

Focus on reasoning, reliability, and safety of AI systems and LLMs

Key Responsibilities

Design, develop, and optimize AI models and intelligent agents

Build and evaluate neural network architectures for various AI applications

Develop NLP solutions, including prompt engineering and LLM evaluation

Create software tools that support AI benchmarking and testing workflows

Analyze model behavior to identify performance gaps and improvement opportunities

Design robust evaluation methodologies for reasoning, reliability, and safety

Collaborate with cross-functional teams to improve deployment readiness

Document methodologies and contribute to reusable, production-quality codebases

Help scale automated benchmarking and evaluation processes

Technical Skills Required

Python Neural networks Natural Language Processing (NLP)

Benefits & Perks

Remote work

Nice to Have

Experience evaluating AI systems for safety, robustness, and reliability

Experience working with Large Language Models (LLMs) and AI agents

Familiarity with AI benchmarking frameworks and model evaluation pipelines

Ability to work independently in a fully remote, collaborative environment

Job Description

About Dusker AI

Dusker AI specializes in benchmarking and evaluating AI agents for real-world performance. Using expert-driven evaluation frameworks, we assess reasoning, reliability, adaptability, and safety—going beyond traditional metrics to identify strengths and weaknesses in AI systems.

We work with organizations building conversational AI and autonomous systems, helping ensure their solutions are deployment-ready through rigorous testing, benchmarking, and evaluation.

Role Overview

We're looking for an AI Engineer to join our remote team. In this role, you'll design, develop, and evaluate AI models and intelligent agents with a focus on reasoning, robustness, and safety across diverse real-world use cases.

You'll collaborate with a talented team to build evaluation frameworks, improve AI performance, and develop tools that help organizations deploy reliable AI systems at scale.

Key Responsibilities

Design, develop, and optimize AI models and intelligent agents.
Build and evaluate neural network architectures for various AI applications.

Interested in remote work opportunities in Development & Programming? Discover Development & Programming Remote Jobs featuring exclusive positions from top companies that offer flexible work arrangements.

Develop NLP solutions, including prompt engineering and LLM evaluation.
Create software tools that support AI benchmarking and testing workflows.
Analyze model behavior to identify performance gaps and improvement opportunities.
Design robust evaluation methodologies for reasoning, reliability, and safety.
Collaborate with cross-functional teams to improve deployment readiness.
Document methodologies and contribute to reusable, production-quality codebases.
Help scale automated benchmarking and evaluation processes.

Required Qualifications

Strong foundation in Computer Science and Software Development, including data structures, algorithms, and writing production-quality code.
Experience with machine learning, neural networks, and pattern recognition.

Browse our curated collection of remote jobs across all categories and industries, featuring positions from top companies worldwide.

Hands-on experience with Natural Language Processing (NLP), including model fine-tuning, prompt engineering, and evaluation of language models.
Proficiency in Python and modern AI/ML frameworks such as PyTorch, TensorFlow, or JAX.
Familiarity with MLOps practices and reproducible experimentation.
Ability to design evaluation methodologies and communicate technical findings effectively.
Bachelor's degree or higher in Computer Science, Data Science, Electrical Engineering, or a related field (or equivalent practical experience).

Preferred Qualifications

Experience evaluating AI systems for safety, robustness, and reliability.
Experience working with Large Language Models (LLMs) and AI agents.
Familiarity with AI benchmarking frameworks and model evaluation pipelines.
Ability to work independently in a fully remote, collaborative environment.

Job Overview

Posted Date Jun 28, 2026

Employment Type Contract

Experience Level Entry level

Location India

Category Programming

Company dusker ai

Mentioned Skills

Industries

Similar Jobs

Explore other opportunities that match your interests

WordPress Developer

Programming

•

1h ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Associate

fetchjobs.co

India

Principal Engineer - Cloud-Based Digital Banking Solutions

Programming

•

8h ago

Premium Job

•••••• •••••• ••••••

Job Type ••••••

Experience Level ••••••

Alkami Technology

India

Remote WordPress Developer

Programming

•

21h ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Associate

fetchjobs.co

India

AI Engineer - AI Agent Evaluation and Benchmarking

Key Highlights

Key Responsibilities

Technical Skills Required

Benefits & Perks

Nice to Have

Job Description

Job Overview

Mentioned Skills

Industries

Similar Jobs

WordPress Developer

fetchjobs.co

Principal Engineer - Cloud-Based Digital Banking Solutions

Premium Job

Alkami Technology

Remote WordPress Developer

fetchjobs.co

Subscribe our newsletter