Senior AI Systems Engineer

botify tech Germany
Relocation
This Job is No Longer Active This position is no longer accepting applications
AI Summary

Botify Tech is seeking a Senior AI Systems Engineer to design and develop software for bare-metal AI infrastructure. The ideal candidate will have 7+ years of experience in software engineering and proficiency in Golang, C++, or Rust.

Key Highlights
Designing and developing software for bare-metal AI infrastructure
Developing software layer (APIs, Controllers, Agents) that automates the lifecycle of AI infrastructure
Managing NVIDIA GPU clusters and familiarity with NVIDIA Container Toolkit
Technical Skills Required
Golang C++ Rust Kubernetes NVIDIA GPU clusters NVIDIA Container Toolkit Terraform Ansible
Benefits & Perks
€60,000 - €85,000 salary
20% Bonus
Relocation Package (if applicable)
Signing Bonus
Hybrid work arrangement (3 days/week in office)

Job Description


AI Senior Systems Engineer - Munich

€60,000 - €85,000 + 20% Bonus + Relocation Package (if applicable) + Signing Bonus

Location Munich – Hybrid must attend Office 3 days a week


Botify Tech has partnered with 1 of the TOP 10 AI businesses in EUROPE, looking for a Senior Systems Engineer.


Key Skills & Experience Required

  • 7+ years of software engineering experience with strong proficiency in Golang, C++, or Rust.
  • Designing and developing the software layer (APIs, Controllers, Agents) that automates the lifecycle of bare-metal AI infrastructure.
  • Deep experience with K8s internals beyond simple deployment.
  • Hands-on experience managing NVIDIA GPU clusters, familiarity with NVIDIA Container Toolkit.
  • Writing custom Kubernetes Operators and CRDs to abstract complex hardware realities (topology awareness, GPU partitioning) into usable interfaces for our AI engineers.
  • Deep Experience within Terraform, Ansible but with a focus on provisioning physical hardware rather than just cloud VMs.
  • Architecting scheduling solutions for large-scale distributed training jobs across massive clusters of GPUs (NVIDIA H200/B200/B300), ensuring efficient bin-packing and gang scheduling.
  • Tuning the software-defined networking layer to support low-latency interconnects (InfiniBand/RDMA/RoCEv2) is essential for multi-node training.
  • Investigating and resolving deep system issues, ranging from PCIe bus errors and NCCL communication timeouts to kernel panics on bare-metal nodes.
  • Creating the "Golden Image" for AI workloads, managing drivers, firmware, and OS optimizations to squeeze maximum performance out of the hardware.


For more info

If you feel this is the role for you or you know someone suitable for the role.

Email me at pinto@botifytech.com


Similar Jobs

Explore other opportunities that match your interests

Database Engineer

Programming
5h ago

Premium Job

Sign up is free! Login or Sign up to view full details.

•••••• •••••• ••••••
Job Type ••••••
Experience Level ••••••

Kaufland e-commerce

Germany

Head of People & Culture

Programming
23h ago
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Director

turbalance

Germany

Senior Fullstack Engineer

Programming
1d ago
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Associate

agilegrid solutions

Germany

Subscribe our newsletter

New Things Will Always Update Regularly