Senior Machine Learning Engineer - LLM Evaluation and Task Creation

Remote
Apply
AI Summary

Design and evaluate advanced machine learning systems, create high-quality ML tasks and datasets, and work closely with researchers and engineers to ensure dataset quality and sound evaluation methodology.

Key Highlights
Frame and design novel ML problems
Build, optimize, and evaluate machine learning models
Conduct robustness testing and bias analysis
Technical Skills Required
Python PyTorch TensorFlow Kaggle DrivenData AWS GCP Azure MLOps tools (Weights & Biases, MLflow, Airflow, Docker, etc.)
Benefits & Perks
$35 per hour
Fully remote and asynchronous work
Weekly payments via Stripe Connect

Job Description


Senior Machine Learning Engineer – LLM Evaluation / Task Creation (India Based)

Hourly Contract | Remote | $35 per hour

Role Description

Mercor is hiring Senior Machine Learning Engineers to collaborate with a leading AI research lab on the design, evaluation, and benchmarking of advanced machine learning systems. In this role, you will create high-quality ML tasks, datasets, and evaluation workflows that directly support the training and assessment of next-generation AI and LLM-based systems.

This position is ideal for engineers with strong applied ML experience and competitive ML backgrounds (e.g., Kaggle), who can translate real-world problem statements into robust, reproducible machine learning pipelines. You will work closely with researchers and engineers to ensure dataset quality, sound evaluation methodology, and impactful experimentation.

Key Responsibilities

  • Frame and design novel ML problems to enhance the reasoning and performance of LLMs
  • Build, optimize, and evaluate machine learning models across classification, prediction, NLP, recommendation, and generative tasks
  • Run rapid experimentation cycles and iterate on model performance
  • Perform advanced feature engineering and data preprocessing
  • Conduct robustness testing, adversarial evaluation, and bias analysis
  • Fine-tune and evaluate transformer-based models when required
  • Maintain clear documentation for datasets, experiments, and modeling decisions
  • Stay up to date with latest ML research, tools, and best practices

Required Qualifications

  • 3+ years of full-time experience in applied machine learning
  • Technical degree in Computer Science, Electrical Engineering, Statistics, Mathematics, or related field
  • Demonstrated competitive ML experience (Kaggle, DrivenData, or equivalent)
  • Evidence of strong performance in ML competitions (leaderboard rankings, medals, finalist placements)
  • Strong proficiency in Python, PyTorch/TensorFlow, and modern ML/NLP frameworks
  • Solid understanding of statistics, optimization, model architectures, and evaluation techniques
  • Experience with ML pipelines, experiment tracking, and distributed training
  • Strong problem-solving, analytical, and communication skills
  • Experience working with cloud platforms (AWS, GCP, or Azure)
  • Fluency in English
  • Must be based in India

Preferred / Nice to Have

  • Kaggle Grandmaster/Master or multiple Gold Medals
  • Experience creating ML benchmarks, evaluations, or challenge problems
  • Background in LLMs, generative models, or multimodal learning
  • Experience with large-scale distributed training
  • Prior work in AI research, ML platforms, or infrastructure teams
  • Contributions to open-source projects, blogs, or research publications
  • Experience with LLM fine-tuning, vector databases, or generative AI workflows
  • Familiarity with MLOps tools (Weights & Biases, MLflow, Airflow, Docker, etc.)
  • Experience optimizing inference performance and deploying models at scale

Compensation & Contract

  • Rate: $35 per hour
  • Engagement: Independent contractor
  • Work Mode: Fully remote and asynchronous
  • Payments: Weekly via Stripe Connect

⚡ PS: Mercor reviews applications daily. Please complete your interview and onboarding steps to be considered for this opportunity. ⚡


Subscribe our newsletter

New Things Will Always Update Regularly