AI Engineer - AI Agent Evaluation and Benchmarking

dusker ai India
Remote
Apply
AI Summary

Design, develop, and evaluate AI models and intelligent agents with a focus on reasoning, robustness, and safety. Build evaluation frameworks and tools to help organizations deploy reliable AI systems at scale. Requires strong CS foundation, Python proficiency, and experience with neural networks and NLP.

Key Highlights
Design and develop AI models and intelligent agents for real-world performance
Build evaluation frameworks and tools for AI benchmarking and testing workflows
Focus on reasoning, reliability, and safety of AI systems and LLMs
Key Responsibilities
Design, develop, and optimize AI models and intelligent agents
Build and evaluate neural network architectures for various AI applications
Develop NLP solutions, including prompt engineering and LLM evaluation
Create software tools that support AI benchmarking and testing workflows
Analyze model behavior to identify performance gaps and improvement opportunities
Design robust evaluation methodologies for reasoning, reliability, and safety
Collaborate with cross-functional teams to improve deployment readiness
Document methodologies and contribute to reusable, production-quality codebases
Help scale automated benchmarking and evaluation processes
Technical Skills Required
Python Neural networks Natural Language Processing (NLP)
Benefits & Perks
Remote work
Nice to Have
Experience evaluating AI systems for safety, robustness, and reliability
Experience working with Large Language Models (LLMs) and AI agents
Familiarity with AI benchmarking frameworks and model evaluation pipelines
Ability to work independently in a fully remote, collaborative environment

Job Description


About Dusker AI

Dusker AI specializes in benchmarking and evaluating AI agents for real-world performance. Using expert-driven evaluation frameworks, we assess reasoning, reliability, adaptability, and safety—going beyond traditional metrics to identify strengths and weaknesses in AI systems.

We work with organizations building conversational AI and autonomous systems, helping ensure their solutions are deployment-ready through rigorous testing, benchmarking, and evaluation.


Role Overview

We're looking for an AI Engineer to join our remote team. In this role, you'll design, develop, and evaluate AI models and intelligent agents with a focus on reasoning, robustness, and safety across diverse real-world use cases.

You'll collaborate with a talented team to build evaluation frameworks, improve AI performance, and develop tools that help organizations deploy reliable AI systems at scale.


Key Responsibilities

  • Design, develop, and optimize AI models and intelligent agents.
  • Build and evaluate neural network architectures for various AI applications.
  • Develop NLP solutions, including prompt engineering and LLM evaluation.
  • Create software tools that support AI benchmarking and testing workflows.
  • Analyze model behavior to identify performance gaps and improvement opportunities.
  • Design robust evaluation methodologies for reasoning, reliability, and safety.
  • Collaborate with cross-functional teams to improve deployment readiness.
  • Document methodologies and contribute to reusable, production-quality codebases.
  • Help scale automated benchmarking and evaluation processes.


Required Qualifications

  • Strong foundation in Computer Science and Software Development, including data structures, algorithms, and writing production-quality code.
  • Experience with machine learning, neural networks, and pattern recognition.
  • Hands-on experience with Natural Language Processing (NLP), including model fine-tuning, prompt engineering, and evaluation of language models.
  • Proficiency in Python and modern AI/ML frameworks such as PyTorch, TensorFlow, or JAX.
  • Familiarity with MLOps practices and reproducible experimentation.
  • Ability to design evaluation methodologies and communicate technical findings effectively.
  • Bachelor's degree or higher in Computer Science, Data Science, Electrical Engineering, or a related field (or equivalent practical experience).


Preferred Qualifications

  • Experience evaluating AI systems for safety, robustness, and reliability.
  • Experience working with Large Language Models (LLMs) and AI agents.
  • Familiarity with AI benchmarking frameworks and model evaluation pipelines.
  • Ability to work independently in a fully remote, collaborative environment.

Similar Jobs

Explore other opportunities that match your interests

WordPress Developer

Programming
1h ago
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Associate

fetchjobs.co

India

Principal Engineer - Cloud-Based Digital Banking Solutions

Programming
8h ago

Premium Job

Sign up is free! Login or Sign up to view full details.

•••••• •••••• ••••••
Job Type ••••••
Experience Level ••••••

Alkami Technology

India

Remote WordPress Developer

Programming
21h ago
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Associate

fetchjobs.co

India

Subscribe our newsletter

New Things Will Always Update Regularly