Distributed Systems Engineer for AI Model Training and Inference

Acceler8 Talent • United State
Visa Sponsorship Relocation
Apply
AI Summary

Join our team as a Distributed Systems Engineer to design and build distributed data and coordination systems for ultra-long-context model training and inference. You will develop high-performance storage and caching systems to support large-scale GPU workloads and work deep in the internals of modern deep learning frameworks in highly distributed environments.

Key Highlights
Design and build distributed data and coordination systems
Develop high-performance storage and caching systems
Work deep in the internals of modern deep learning frameworks
Technical Skills Required
Distributed systems design Public cloud platforms Distributed databases Batch or stream processing systems Distributed file systems Deep learning frameworks GPU workloads Kubernetes Docker
Benefits & Perks
Salary of $225K–$550K dependent on experience
Significant equity
Great benefits inc. 401(k) with 6% company match, comprehensive health, unlimited PTO
Visa sponsorship and SF relocation stipend available

Job Description


Distributed Systems Engineer - San Francisco, CA


A company building frontier-scale AI models that automate software engineering and AI research, combining ultra-long context, domain-specific RL, and massive compute infrastructure are looking for a Distributed Systems Engineer to join their team.


What Will I Be Doing:


  • Design and build distributed data and coordination systems that enable ultra-long-context model training and inference
  • Develop high-performance storage and caching systems to support large-scale GPU workloads
  • Work deep in the internals of modern deep learning frameworks in highly distributed environments
  • Build automation for fault detection, recovery and high availability across GPU clusters
  • Troubleshoot complex, cross-stack issues spanning GPUs, networking, storage, operating systems and cloud infrastructure


What We’re Looking For:


  • Deep expertise in distributed systems design and public cloud platforms
  • Proven experience designing and operating highly available, high-throughput data systems
  • Strong knowledge of distributed databases, batch or stream processing systems, and/or distributed file systems
  • Exceptional problem-solving ability across the full systems stack
  • A hands-on mindset with the curiosity and grit to learn fast in a frontier technical environment


What’s In It for Me:


  • Salary of $225K–$550K dependent on experience + significant equity
  • Great benefits inc. 401(k) with 6% company match, comprehensive health, unlimited PTO
  • Visa sponsorship and SF relocation stipend available
  • Well-funded ($465M+) with backing from top investors


Apply now for immediate consideration!


Similar Jobs

Explore other opportunities that match your interests

Senior Software Engineer - Linux - GlobalProtect Team

Programming
•
4h ago

Premium Job

Sign up is free! Login or Sign up to view full details.

•••••• •••••• ••••••
Job Type ••••••
Experience Level ••••••

Jobs via Dice

United State

Senior Staff Backend Engineer

Programming
•
4h ago

Premium Job

Sign up is free! Login or Sign up to view full details.

•••••• •••••• ••••••
Job Type ••••••
Experience Level ••••••

xbox media solutions (formerly...

United State

Staff Systems Engineer

Programming
•
4h ago

Premium Job

Sign up is free! Login or Sign up to view full details.

•••••• •••••• ••••••
Job Type ••••••
Experience Level ••••••

Northrop Grumman

United State

Subscribe our newsletter

New Things Will Always Update Regularly