Inference Runtime Engineer

inferact Singapore
Visa Sponsorship
Apply
AI Summary

Design and optimize core inference engine for large language and diffusion models. Implement performance improvements across diverse hardware architectures. Contribute to vLLM's core codebase and research-driven inference techniques.

Key Highlights
Optimize LLM and diffusion model serving at the core of vLLM
Work with transformer architectures and PyTorch internals
Implement KV-cache memory management and prefix caching
Develop performant code for multimodal inference systems
Key Responsibilities
Optimize how models execute across diverse hardware and architectures
Implement performant and maintainable code for complex ML codebases
Debug and contribute to vLLM's core inference engine
Technical Skills Required
Python PyTorch LLM inference systems KV-cache memory management
Benefits & Perks
Medical coverage
Dental coverage
Vision coverage
Nice to Have
RL frameworks and algorithms for LLMs
Multimodal inference (audio/image/video/text)
Contributions to open-source ML or system infrastructure projects
Implemented core features in vLLM or other inference engine projects
Contributed to vLLM integrations (verl, OpenRLHF, Unsloth, LlamaFactory)
Written widely-shared technical blogs or side projects on vLLM or LLM inference

Job Description


Inferact's mission is to grow vLLM as the world's AI inference engine and accelerate AI progress by making inference cheaper and faster. Founded by the creators and core maintainers of vLLM, we sit at the intersection of models and hardware—a position that took years to build.

About The Role

We're looking for an inference runtime engineer to push the boundaries of what's possible in LLM and diffusion model serving. Models grow larger. Architectures shift: mixture-of-experts, multimodal, agentic. Every breakthrough demands innovations on the inference engine itself. You'll work at the core of vLLM, optimizing how models execute across diverse hardware and architectures. Your work will directly impact how the world runs AI inference.

Skills And Qualifications

Minimum qualifications:

  • Bachelor's degree or equivalent experience in computer science, engineering, or similar.
  • Deep understanding of transformer architectures and their variants.
  • Strong programming skills in Python with experience in PyTorch internals.
  • Experience with LLM inference systems (vLLM, TensorRT-LLM, SGLang, TGI).
  • Ability to read and implement model architectures and inference techniques from research papers.
  • Demonstrate the ability to contribute performant and maintainable code and debug in complex ML codebases.

Preferred qualifications:

  • Deep understanding of KV-cache memory management, prefix caching, and hybrid model serving.
  • Familiarity with RL frameworks and algorithms for LLMs.
  • Experience with multimodal inference (audio/image/video/text).
  • Contributions to open-source ML or system infrastructure projects.

Bonus points if you have:

  • Implemented core features in vLLM or other inference engine projects.
  • Contributed to vLLM integrations (verl, OpenRLHF, Unsloth, LlamaFactory, etc).
  • Written widely-shared technical blogs or side projects on vLLM or LLM inference.

Logistics

  • Location: This role is based in Singapore.
  • Compensation: Depending on background, skills, and experience, the expected annual salary range for this position is S$200,000 to S$400,000 annually + equity.
  • Visa sponsorship: We sponsor visas on a case-by-case basis.
  • Benefits: Inferact offers a generous benefits package, including medical, dental, and vision coverage.

Compensation Range: SGD 200K - SGD 400K


Similar Jobs

Explore other opportunities that match your interests

Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Not Applicable

necessary ventures

Singapore
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Director

base camp digital

Singapore

Head of Trading Technology

Programming
1w ago
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Not Applicable

base camp digital

Singapore

Subscribe our newsletter

New Things Will Always Update Regularly