Senior ML Engineer - Inference and Optimization

Stealth Startup • United State

Visa Sponsorship Relocation

Apply

AI Summary

Transform PyTorch research code into optimized inference solutions. Deploy and integrate researcher-trained model checkpoints. Conduct thorough performance profiling and benchmarking.

Key Highlights

Deploy and integrate researcher-trained model checkpoints

Conduct thorough performance profiling and benchmarking

Implement neural network optimization techniques

Key Responsibilities

Deploy and integrate researcher-trained model checkpoints into our cloud infrastructure and production pipelines.

Conduct thorough performance profiling and benchmarking to identify and eliminate computational bottlenecks.

Implement neural network optimization techniques including quantization, pruning, and architectural refinements while preserving model accuracy.

Technical Skills Required

PyTorch Deep learning models Multi-GPU inference Large-scale model serving Efficient attention mechanisms Model quantization Inference frameworks Cloud platforms Storage solutions Modern training frameworks

Benefits & Perks

Generous health, dental, and vision coverage

Unlimited PTO

Paid parental leave

Relocation support

Job Description

The role:

As our first ML Engineer specializing in inference and optimization, you'll bridge the gap between cutting-edge research models and production systems. Your expertise will transform PyTorch research code into highly optimized, low-latency inference solutions that power our user-facing applications. You'll work closely with our GenAI researchers, vision ML engineers, and backend team to deliver exceptional performance.

What you’ll do:

Deploy and integrate researcher-trained model checkpoints into our cloud infrastructure and production pipelines.
Conduct thorough performance profiling and benchmarking to identify and eliminate computational bottlenecks.
Implement neural network optimization techniques including quantization, pruning, and architectural refinements while preserving model accuracy.
Develop efficient training and fine-tuning strategies with optimal precision trade-offs and parallelism.

Looking to advance your IT & Network Engineering career with relocation support? Explore IT & Network Engineering Jobs with Relocation Packages that include comprehensive packages to help you move and settle in your new role.

Build and maintain scalable multi-GPU inference solutions with sophisticated model parallelism and serving architectures.
Collaborate with the research team to ensure optimization integrate smoothly with model development workflows.

You may be a strong fit if you:

Have experience deploying and optimizing deep learning models for production environments, particularly with multi-GPU inference and large-scale model serving.
Are well-versed in cutting-edge techniques for optimizing both inference and training workloads.

Discover our full range of relocation jobs with comprehensive support packages to help you relocate and settle in your new location.

Possess strong knowledge of efficient attention mechanisms and algorithms.
Have hands-on experience implementing model quantization and working with inference frameworks.
Can write production-quality code and successfully integrate ML models into robust inference pipelines.
Are familiar with various cloud platforms, storage solutions, and modern training frameworks.

Logistics:

Interested in relocating to United State? Check out our comprehensive Relocation Jobs in United State page with detailed relocation packages and benefits.

This role is based in San Jose, where we work in person. We believe the best ideas come from being in the same room.
We sponsor visas. We are committed to working through the process together for the right candidates. If you're currently outside the US, we're also committed to helping you relocate to the US throughout this process.
We offer generous health, dental, and vision coverage, unlimited PTO, paid parental leave, and relocation support as needed.
Don't meet every single qualification? That’s okay — we care more about your trajectory than checking every box. If the role excites you and the mission resonates, we'd love to hear from you.

Note: In the event your application is successful and an offer of employment is made to you, any offer of employment will be conditional on the results of a background check, performed by a third party acting on our behalf.

Job Overview

Posted Date May 23, 2026

Employment Type Full-time

Experience Level Mid-Senior level

Location United State

Category Networking

Company Stealth Startup

Mentioned Skills

Industries

Similar Jobs

Explore other opportunities that match your interests

Principal Human Factors Engineer

Networking

•

6h ago

Premium Job

•••••• •••••• ••••••

Job Type ••••••

Experience Level ••••••

uxr hunt

United State

Consulting Chief Engineer

Networking

•

6h ago

Premium Job

•••••• •••••• ••••••

Job Type ••••••

Experience Level ••••••

Northrop Grumman

United State

Lab Operations Manager 2

Networking

•

7h ago

Premium Job

•••••• •••••• ••••••

Job Type ••••••

Experience Level ••••••

Northrop Grumman

United State

Senior ML Engineer - Inference and Optimization

Key Highlights

Key Responsibilities

Technical Skills Required

Benefits & Perks

Job Description

Job Overview

Mentioned Skills

Industries

Similar Jobs

Principal Human Factors Engineer

Premium Job

uxr hunt

Consulting Chief Engineer

Premium Job

Northrop Grumman

Lab Operations Manager 2

Premium Job

Northrop Grumman

Subscribe our newsletter