AI Operations Engineer

pom Egypt
Relocation
Apply
AI Summary

Manage a GPU-accelerated LLM inference platform, provision and maintain GPU servers, and deploy LLM inference engines. Strong NVIDIA GPU stack knowledge and Linux systems engineering experience required. 4+ years of experience in Linux systems engineering and 2+ years with GPU or ML/AI infrastructure.

Key Highlights
Manage a GPU-accelerated LLM inference platform
Provision and maintain GPU servers
Deploy LLM inference engines
Key Responsibilities
Provision and maintain GPU servers end-to-end
Deploy and operate LLM inference engines with multi-GPU sharding and quantization strategies
Manage an API gateway for load balancing, model routing, and per-application usage tracking
Own observability: GPU telemetry, latency metrics (p50/p95/p99), cost attribution, and alerting
Handle offline/air-gapped deployments with no internet dependency on production nodes
Benchmark new models, plan fleet capacity, and advise dev teams on prompt and parameter tuning
Support fine-tuning workflows (LoRA/QLoRA) and deploy fine-tuned models to production
Technical Skills Required
Linux systems engineering NVIDIA GPU stack Ansible shell scripting Python operational tooling containerisation service management database backends
Benefits & Perks
4+ years of experience in Linux systems engineering
2+ years with GPU or ML/AI infrastructure
Strong NVIDIA GPU stack knowledge
Nice to Have
Arabic NLP or multilingual model evaluation experience
Familiarity with MoE architectures or LLM API gateway/proxy solutions
Prior air-gapped or data-sovereign deployment experience

Job Description


Description:

Our client in KSA is seeking an AI Operations Engineer to manage a GPU-accelerated LLM inference platform. You'll own the full stack — from bare-metal provisioning to production model deployment, monitoring, and performance optimization. Open to relocation Candidates


Role:

  • Provision and maintain GPU servers end-to-end: OS hardening, NVIDIA drivers, inference engine deployment
  • Deploy and operate LLM inference engines with multi-GPU sharding and quantization strategies
  • Manage an API gateway for load balancing, model routing, and per-application usage tracking
  • Own observability: GPU telemetry, latency metrics (p50/p95/p99), cost attribution, and alerting
  • Handle offline/air-gapped deployments with no internet dependency on production nodes
  • Benchmark new models, plan fleet capacity, and advise dev teams on prompt and parameter tuning
  • Support fine-tuning workflows (LoRA/QLoRA) and deploy fine-tuned models to production


Qualifications

  • 4+ years in Linux systems engineering; 2+ years with GPU or ML/AI infrastructure
  • Hands-on experience deploying LLM inference engines in production
  • Strong NVIDIA GPU stack knowledge: drivers, toolkits, runtime libraries
  • Proficient in Ansible (or similar IaC), shell scripting, and Python operational tooling
  • Solid networking fundamentals: reverse proxy, TLS, HTTP/SSE, load balancing
  • Experience with containerisation, service management, and database backends
  • Clear communicator; comfortable working independently in restricted-network environments

Nice to have

  • Arabic NLP or multilingual model evaluation experience
  • Familiarity with MoE architectures or LLM API gateway/proxy solutions
  • Prior air-gapped or data-sovereign deployment experience

Similar Jobs

Explore other opportunities that match your interests

Automation Engineer

Programming
3w ago

Premium Job

Sign up is free! Login or Sign up to view full details.

•••••• •••••• ••••••
Job Type ••••••
Experience Level ••••••

The Coca-Cola Company

Egypt

Web Developer

Programming
1h ago
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Associate

fetchjobs.co

India

Engineering Manager, Full Stack (Revenue)

Programming
1h ago

Premium Job

Sign up is free! Login or Sign up to view full details.

•••••• •••••• ••••••
Job Type ••••••
Experience Level ••••••

Discord

San Francisco Bay Area

Subscribe our newsletter

New Things Will Always Update Regularly