Inference Runtime Engineer

inferact • Singapore

Visa Sponsorship

Apply

AI Summary

Design and optimize core inference engine for large language and diffusion models. Implement performance improvements across diverse hardware architectures. Contribute to vLLM's core codebase and research-driven inference techniques.

Key Highlights

Optimize LLM and diffusion model serving at the core of vLLM

Work with transformer architectures and PyTorch internals

Implement KV-cache memory management and prefix caching

Develop performant code for multimodal inference systems

Key Responsibilities

Optimize how models execute across diverse hardware and architectures

Implement performant and maintainable code for complex ML codebases

Debug and contribute to vLLM's core inference engine

Technical Skills Required

Python PyTorch LLM inference systems KV-cache memory management

Benefits & Perks

Medical coverage

Dental coverage

Vision coverage

Nice to Have

RL frameworks and algorithms for LLMs

Multimodal inference (audio/image/video/text)

Contributions to open-source ML or system infrastructure projects

Implemented core features in vLLM or other inference engine projects

Contributed to vLLM integrations (verl, OpenRLHF, Unsloth, LlamaFactory)

Written widely-shared technical blogs or side projects on vLLM or LLM inference

Job Description

Inferact's mission is to grow vLLM as the world's AI inference engine and accelerate AI progress by making inference cheaper and faster. Founded by the creators and core maintainers of vLLM, we sit at the intersection of models and hardware—a position that took years to build.

About The Role

We're looking for an inference runtime engineer to push the boundaries of what's possible in LLM and diffusion model serving. Models grow larger. Architectures shift: mixture-of-experts, multimodal, agentic. Every breakthrough demands innovations on the inference engine itself. You'll work at the core of vLLM, optimizing how models execute across diverse hardware and architectures. Your work will directly impact how the world runs AI inference.

Skills And Qualifications

Minimum qualifications:

Bachelor's degree or equivalent experience in computer science, engineering, or similar.
Deep understanding of transformer architectures and their variants.
Strong programming skills in Python with experience in PyTorch internals.
Experience with LLM inference systems (vLLM, TensorRT-LLM, SGLang, TGI).
Ability to read and implement model architectures and inference techniques from research papers.

Searching for Development & Programming roles that provide visa sponsorship? Connect with international employers through Development & Programming Jobs with Visa Sponsorship opportunities actively seeking talented professionals.

Demonstrate the ability to contribute performant and maintainable code and debug in complex ML codebases.

Preferred qualifications:

Deep understanding of KV-cache memory management, prefix caching, and hybrid model serving.
Familiarity with RL frameworks and algorithms for LLMs.
Experience with multimodal inference (audio/image/video/text).
Contributions to open-source ML or system infrastructure projects.

Bonus points if you have:

Explore our comprehensive directory of visa sponsorship jobs from employers worldwide who are ready to sponsor talented international professionals.

Implemented core features in vLLM or other inference engine projects.
Contributed to vLLM integrations (verl, OpenRLHF, Unsloth, LlamaFactory, etc).
Written widely-shared technical blogs or side projects on vLLM or LLM inference.

Logistics

Location: This role is based in Singapore.
Compensation: Depending on background, skills, and experience, the expected annual salary range for this position is S$200,000 to S$400,000 annually + equity.
Visa sponsorship: We sponsor visas on a case-by-case basis.
Benefits: Inferact offers a generous benefits package, including medical, dental, and vision coverage.

Compensation Range: SGD 200K - SGD 400K

Job Overview

Posted Date Jun 24, 2026

Employment Type Full-time

Experience Level Not Applicable

Location Singapore

Annual Salary 200,000 - 400,000 SGD

Category Programming

Company inferact

Mentioned Skills

Similar Jobs

Explore other opportunities that match your interests

Application Software Engineer - Relocation to Tokyo

Programming

•

3d ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Not Applicable

necessary ventures

Singapore

Head of Trading Technology - Elite Proprietary HFT Firm (Singapore)

Programming

•

1w ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Director

base camp digital

Singapore

Head of Trading Technology

Programming

•

1w ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Not Applicable

base camp digital

Singapore

Inference Runtime Engineer

Key Highlights

Key Responsibilities

Technical Skills Required

Benefits & Perks

Nice to Have

Job Description

Job Overview

Mentioned Skills

Industries

Similar Jobs

Application Software Engineer - Relocation to Tokyo

necessary ventures

Head of Trading Technology - Elite Proprietary HFT Firm (Singapore)

base camp digital

Head of Trading Technology

base camp digital

Subscribe our newsletter