AI Engineer — Production LLM Systems & Evaluation

Aurora • United State

Visa Sponsorship

Apply

AI Summary

AI Engineer responsible for owning model behavior in production, evaluating, and refining LLM-powered systems. Requires 3+ years of shipping AI in production at scale, direct ownership of model quality, and experience building evaluation sets.

Key Highlights

Own model behavior in production

Evaluate and refine LLM-powered systems

Work directly with subject matter experts and enterprise customers

Key Responsibilities

Evaluate, workflow design, fine-tuning, release discipline, and turning customer feedback into product improvements

Work directly with subject matter experts and enterprise customers to understand what they mean by a correct answer, where the system fails, and whether the fix belongs in the model, the prompt, the retrieval layer, or the workflow itself

Generalize one-off wins into reusable platform primitives instead of leaving them as bespoke deployments

Technical Skills Required

Machine Learning Artificial Intelligence Python

Benefits & Perks

$210K–$350K base + competitive equity

Hybrid or Remote work

Full-time employment

Job Description

AI Engineer — Production LLM Systems & Evaluation

New York, NY or San Francisco, CA · Hybrid or Remote · Full-time

$210K–$350K base + competitive equity

The company

The company is building software that captures expert judgment for regulated industries, starting with financial services.

The first product is an AI-powered third-party risk management platform for financial institutions. It captures the compliance reasoning that normally lives in the heads of senior experts and turns it into software that can be deployed, measured, and improved over time.

The product already serves FDIC-insured banks. The business has gone from 0 to $10M ARR in less than a year, closed a $25M Series A, and has $40M in total contract value. It went from 0 to 5 live deployments in 45 days and is on track to hit 15 live deployments next month.

The team is 8 people, founded in 2023, and includes former regulators, heads of compliance and legal at fintechs, and experienced engineers. The company is backed by leading institutional investors.

The role

This is a hands-on AI engineering role for someone who wants to own model behavior in production.

The scope includes evaluation, workflow design, fine-tuning, release discipline, and turning customer feedback into product improvements.

You will be customer-facing. A big part of the job is working directly with subject matter experts and enterprise customers to understand what they mean by a correct answer, where the system fails, and whether the fix belongs in the model, the prompt, the retrieval layer, or the workflow itself.

Searching for Development & Programming roles that provide visa sponsorship? Connect with international employers through Development & Programming Jobs with Visa Sponsorship opportunities actively seeking talented professionals.

The output of your work should not stop at a single deployment. Customer-specific solutions should be generalized back into the core learning library so the platform gets stronger with each new customer.

The technical problem

Model quality here is a systems problem, not a prompt problem.

The product has to produce grounded, defensible output from messy input, keep its behavior stable across edge cases, and make it easy for humans to review and trust what it returns.

The hard part is building a production system with measurable quality, controlled regressions, and clear feedback loops from real users.

Why now

The hard part has shifted from proving demand to making every deployment better than the last.

With live banks, rapid deployment velocity, and recurring enterprise feedback, the next constraint is model quality at scale: evaluation, reliability, and reuse across customers.

This is the point where engineering decisions become durable platform infrastructure.

What you'll own

LLM-powered systems and agentic workflows: ship end-user experiences that are accurate, usable, and production-ready.

Explore our comprehensive directory of visa sponsorship jobs from employers worldwide who are ready to sponsor talented international professionals.

Evaluation frameworks: build gold sets, scoring rubrics, regression tests, and release gates that catch quality issues before customers do.
Model refinement: use fine-tuning, prompt iteration, and data-driven feedback to improve accuracy and consistency.
Customer-facing iteration: work with SMEs and enterprise users to prototype, validate, and ship improvements quickly.
Core learning library: generalize one-off wins into reusable platform primitives instead of leaving them as bespoke deployments.
Production quality: keep the system observable, measurable, and stable as the product and customer base grow.

Who this is for

You are likely a strong fit if you have:

3+ years shipping AI in production at scale, with direct ownership of model quality.
Built systems where offline evaluation, production behavior, and customer feedback all mattered.
Owned more than integration work; you have been responsible for the model behavior itself.
Experience building evaluation sets and using them to make release decisions.
Comfort working directly with technical and non-technical stakeholders, including domain experts.
Judgment about when to use rules, prompts, retrieval, fine-tuning, or workflow changes.
Experience in environments where both false positives and false negatives have real cost.
The ability to explain technical tradeoffs clearly and without hand-waving.

This role is not for you if

Interested in opportunities specifically in United State? Discover our dedicated Visa Sponsorship Jobs in United State page featuring roles from top employers in this location.

You want to stay in prototype mode and avoid production ownership.
You want a research-only role with no customer contact.
You prefer narrow tickets and heavy specification before you start.
You are not interested in evaluation rigor, reliability, or reuse.
You optimize for novelty over repeatable quality.

Compensation and logistics

Base salary: $210K–$350K
Equity: competitive
Location: New York, NY
Work model: hybrid or remote
Employment: full-time
Visa sponsorship: available

About Aurora

Aurora helps exceptional engineers find the right role at some of the most ambitious startups worldwide.

We work with teams that value high ownership, strong technical standards, and clear scope.

Job Overview

Posted Date Jun 23, 2026

Employment Type Full-time

Experience Level Entry level

Location United State

Annual Salary 252,000 USD

Category Programming

Company Aurora

Mentioned Skills

Similar Jobs

Explore other opportunities that match your interests

Early Stage AI Infrastructure Engineer - Help Shape Product & Culture

Programming

•

12m ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Mid-Senior level

Big Wave Digital

United State

Senior Python Developer - Enterprise Applications & Data Engineering

Programming

•

1h ago

Premium Job

•••••• •••••• ••••••

Job Type ••••••

Experience Level ••••••

Bright Vision Technologies

United State

Staff Backend Engineer

Programming

•

1h ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Not Applicable

david joseph & company

United State

AI Engineer — Production LLM Systems & Evaluation

Key Highlights

Key Responsibilities

Technical Skills Required

Benefits & Perks

Job Description

Job Overview

Mentioned Skills

Industries

Similar Jobs

Early Stage AI Infrastructure Engineer - Help Shape Product & Culture

Big Wave Digital

Senior Python Developer - Enterprise Applications & Data Engineering

Premium Job

Bright Vision Technologies

Staff Backend Engineer

david joseph & company

Subscribe our newsletter