Red Team Specialist for AI Model Safety

Mercor • United State
Remote
Apply
AI Summary

Mercor is seeking a Red Team Specialist to connect elite creative and technical talent with leading AI research labs. The role involves red teaming conversational AI models and agents to identify jailbreaks, prompt injections, and misuse cases. Key requirements include native-level fluency in English and Japanese, prior experience in red teaming, and strong communication skills.

Key Highlights
Red team conversational AI models and agents
Identify jailbreaks, prompt injections, and misuse cases
Generate high-quality human data by annotating failures
Key Responsibilities
Red team conversational AI models and agents
Generate high-quality human data by annotating failures
Apply structure by following taxonomies, benchmarks, and playbooks
Technical Skills Required
Adversarial ML Cybersecurity Socio-technical risk Conversational AI testing
Benefits & Perks
$50/hour
Remote work
Flexible work arrangement
Nice to Have
Adversarial ML: jailbreak datasets, prompt injection, RLHF/DPO attacks
Background in Cybersecurity: penetration testing, exploit development, reverse engineering

Job Description


About The Job

Mercor connects elite creative and technical talent with leading AI research labs. Headquartered in San Francisco, our investors include Benchmark, General Catalyst, Peter Thiel, Adam D'Angelo, Larry Summers, and Jack Dorsey.

Position: Red Team Specialist

Type: Full-time or Part-time Contract Work

Compensation: $50/hour

Location: Remote; Geography restricted to USA, Japan

Role Responsibilities

  • Red team conversational AI models and agents to identify jailbreaks, prompt injections, and misuse cases.
  • Generate high-quality human data by annotating failures, classifying vulnerabilities, and flagging systemic risks.
  • Apply structure by following taxonomies, benchmarks, and playbooks to ensure consistent testing.
  • Document reproducibly by producing reports, datasets, and attack cases for customer action.
  • Work independently and asynchronously to meet deadlines while enhancing AI model safety.

Qualifications

Must-Have

  • Native-level fluency in English and Japanese.
  • Prior experience in red teaming (AI adversarial work, cybersecurity, socio-technical probing).
  • Strong communication skills to explain risks to technical and non-technical stakeholders.
  • Ability to adapt and thrive across diverse projects and customers.

Preferred

  • Experience in Adversarial ML: jailbreak datasets, prompt injection, RLHF/DPO attacks, model extraction.
  • Background in Cybersecurity: penetration testing, exploit development, reverse engineering.
  • Expertise in socio-technical risk: harassment/disinfo probing, abuse analysis, conversational AI testing.
  • Creative probing skills: psychology, acting, writing for unconventional adversarial thinking.

Compensation & Legal

  • Hourly contractor, Paid weekly via Stripe Connect.

Application Process (Takes 20–30 mins to complete)

  • Upload resume
  • AI interview based on your resume
  • Submit form

Resources & Support

  • For details about the interview process and platform information, please check: https://talent.docs.mercor.com/welcome/welcome
  • For any help or support, reach out to: support@mercor.com

PS: Our team reviews applications daily. Please complete your AI interview and application steps to be considered for this opportunity.

,


Similar Jobs

Explore other opportunities that match your interests

QA Automation Engineer

Testing
•
8h ago
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Not Applicable

Radformation

United State

Quality Assurance & Testing Lead

Testing
•
8h ago
Visa Sponsorship Relocation Remote
Job Type Contract
Experience Level Mid-Senior level

hatch pros

United State

Software Test Engineer

Testing
•
12h ago

Premium Job

Sign up is free! Login or Sign up to view full details.

•••••• •••••• ••••••
Job Type ••••••
Experience Level ••••••

Paylocity

United State

Subscribe our newsletter

New Things Will Always Update Regularly