Manage a GPU-accelerated LLM inference platform, provision and maintain GPU servers, and deploy LLM inference engines. Strong NVIDIA GPU stack knowledge and Linux systems engineering experience required. 4+ years of experience in Linux systems engineering and 2+ years with GPU or ML/AI infrastructure.
Key Highlights
Key Responsibilities
Technical Skills Required
Benefits & Perks
Nice to Have
Job Description
Description:
Our client in KSA is seeking an AI Operations Engineer to manage a GPU-accelerated LLM inference platform. You'll own the full stack — from bare-metal provisioning to production model deployment, monitoring, and performance optimization. Open to relocation Candidates
Role:
- Provision and maintain GPU servers end-to-end: OS hardening, NVIDIA drivers, inference engine deployment
- Deploy and operate LLM inference engines with multi-GPU sharding and quantization strategies
- Manage an API gateway for load balancing, model routing, and per-application usage tracking
- Own observability: GPU telemetry, latency metrics (p50/p95/p99), cost attribution, and alerting
- Handle offline/air-gapped deployments with no internet dependency on production nodes
- Benchmark new models, plan fleet capacity, and advise dev teams on prompt and parameter tuning
- Support fine-tuning workflows (LoRA/QLoRA) and deploy fine-tuned models to production
Looking to advance your Development & Programming career with relocation support? Explore Development & Programming Jobs with Relocation Packages that include comprehensive packages to help you move and settle in your new role.
Qualifications
- 4+ years in Linux systems engineering; 2+ years with GPU or ML/AI infrastructure
- Hands-on experience deploying LLM inference engines in production
- Strong NVIDIA GPU stack knowledge: drivers, toolkits, runtime libraries
- Proficient in Ansible (or similar IaC), shell scripting, and Python operational tooling
- Solid networking fundamentals: reverse proxy, TLS, HTTP/SSE, load balancing
- Experience with containerisation, service management, and database backends
- Clear communicator; comfortable working independently in restricted-network environments
Discover our full range of relocation jobs with comprehensive support packages to help you relocate and settle in your new location.
Nice to have
- Arabic NLP or multilingual model evaluation experience
- Familiarity with MoE architectures or LLM API gateway/proxy solutions
- Prior air-gapped or data-sovereign deployment experience
Similar Jobs
Explore other opportunities that match your interests
Automation Engineer
The Coca-Cola Company
fetchjobs.co
Engineering Manager, Full Stack (Revenue)