Machine Learning Engineer (PyTorch to JAX Migration & Systems Optimization)
Seeking a specialized Machine Learning Engineer to re-architect LLMs from PyTorch to JAX for high-performance TPU/GPU clusters. This role involves structural porting, state management transition, and advanced profiling for maximum throughput and hardware efficiency. Key requirements include deep expertise in the high-performance AI stack, JAX/PyTorch ecosystems, and hardware-aware optimization. This is a 100% remote role requiring work in EST time zone, with no agency or C2C considered.
Key Highlights
Key Responsibilities
Technical Skills Required
Benefits & Perks
Nice to Have
Job Description
Dice is the leading career destination for tech experts at every stage of their careers. Our client, Paradigm Infotech, is seeking the following. Apply via Dice today!
100% remote role
Need to work as per EST
No agency or C2C will NOT be considered and Visa sponsorship is not available nor provided
Machine Learning Engineer: Framework Migration &Systems Optimization (PyTorch to JAX)
We are seeking a specialized Machine Learning Engineer with deep expertise in the
high-performance AI stack. This role isn't just about "translating" code; it s about
re-architecting Large Language Models (LLMs) to thrive in a JAX-native environment,
specifically targeting TPU and GPU clusters at scale. You will bridge the gap between high-level PyTorch research implementations and thefunctional, XLA-optimized world of JAX/XLA, ensuring that our models achieve maximum throughput and hardware efficiency.
- Core Framework Migration
State Management: Transition imperative PyTorch state management to JAX s purely functional paradigm, handling PRNGKey management and immutable state updates with precision.
Weight Translation: Develop robust pipelines for checkpoint conversion, ensuring numerical parity between frameworks via rigorous unit testing and error tolerance checks.
- Advanced Profiling & Numerical Stability
Numerical Debugging: Implement precision-tracking tools to ensure that $BF16$ or $FP8$ training runs remain stable during the transition, preventing gradient divergence.
- Scaling & Distributed Training
Interested in remote work opportunities in Machine Learning & AI? Discover Machine Learning & AI Remote Jobs featuring exclusive positions from top companies that offer flexible work arrangements.
- Hardware-Aware Optimization
Memory Management: Apply optimizations like Selective Activation Checkpointing and memory-efficient attention (FlashAttention-2 JAX implementations) based on the specific HBM (High Bandwidth Memory) constraints of the hardware.
- Fine-Tuning & Adaptation
Architectural Evolution: Stay ahead of the curve by adapting JAX implementations for newer primitives like Mamba/SSMs, Grouped-Query Attention (GQA), and Linear Attention as they emerge in the research space.
Familiarity with the following technical Stack & Tooling
- Core Frameworks & Libraries:
PyTorch Ecosystem: Deep knowledge of PyTorch 2.x, including torch.compile, DistributedDataParallel (DDP), and FSDP.
Intermediate Representations: Proficiency in HLO (High-Level Optimizer) and MLIR to understand how JAX code translates to hardware instructions.
Data Loaders: Experience migrating from torch.utils.data to Grain or tf.data for high-throughput JAX pipelines.
- Profiling & Observability device memory traffic.
NVIDIA Nsight Systems: To analyze GPU utilization, SM occupancy, and NVLink
Perfetto: For deep-dive trace analysis across multi-node TPU/GPU clusters.
Browse our curated collection of remote jobs across all categories and industries, featuring positions from top companies worldwide.
- Infrastructure & Hardware
Orchestration: Experience with Slurm or Kubernetes (K8s) for managing large-scale training jobs.
Cloud Providers: Proficiency in Google Cloud (Google Cloud Platform) for TPUs or AWS/Azure for high-end GPU instances.
Core Skills & Competencies
- Software Engineering Excellence
Asynchronous Programming: Understanding JAX s asynchronous dispatch model and how to avoid "host-sync" bottlenecks.
Testing Rigor: Ability to write property-based tests for numerical stability.
- Distributed Systems Knowledge
Network Topology: Understanding how rack-level interconnects (e.g., InfiniBand) affect the choice of 3D parallelism strategies.
- Mathematical & AI Domain Expertise (Desirable)
Mixed Precision Training: Expert-level knowledge of Stochastic Rounding, Loss Scaling, and the nuances of BF16 vs. FP8 training.
Architecture Insight: Ability to decompose modern LLM components (KV Caches, Rotary Embeddings, Gated Linear Units) into their primitive mathematical operations
Similar Jobs
Explore other opportunities that match your interests
United Software Group Inc
Ampstek
Staff Machine Learning Engineer, AI