AI Summary
Multiverse Computing is seeking a Senior Engineer to lead the development of the software layer for AI Gigafactory. The successful candidate will design and develop the control plane, orchestrate high-scale compute, and optimize the fabric.
Key Highlights
Design and develop the control plane for AI Gigafactory
Orchestrate high-scale compute for large-scale distributed training jobs
Optimize the fabric for low-latency interconnects
Technical Skills Required
Benefits & Perks
Indefinite contract
Equal pay guaranteed
Variable performance bonus
Signing bonus
Relocation package (if applicable)
Private health insurance
Eligibility for educational budget according to internal policy
Hybrid opportunity
Flexible working hours
Job Description
Multiverse Computing
Multiverse is a well-funded, fast-growing deep-tech company founded in 2019. We are the largest quantum software company in the EU and have been recognized by CB Insights (2023 and 2025) as one of the 100 most promising AI companies in the world.
With 180+ employees and growing, our team is fully multicultural and international. We deliver hyper-efficient software for companies seeking a competitive edge through quantum computing and artificial intelligence.
Our flagship products, CompactifAI and Singularity, address critical needs across various industries:
CompactifAI is a groundbreaking compression tool for foundational AI models based on Tensor Networks. It enables the compression of large AI systems—such as language models—to make them significantly more efficient and portable.
Singularity is a quantum- and quantum-inspired optimization platform used by blue-chip companies to solve complex problems in finance, energy, manufacturing, and beyond. It integrates seamlessly with existing systems and delivers immediate performance gains on classical and quantum hardware.
You’ll be working alongside world-leading experts to develop solutions that tackle real-world challenges. We’re looking for passionate individuals eager to grow in an ethics-driven environment that values sustainability and diversity.
We’re committed to building a truly inclusive culture—come and join us.
Role Description
We are looking for a Senior Engineer to lead a critical initiative within our Platform Engineering team: building the software layer for AI Gigafactory. In this role, you will move beyond consuming public cloud resources to architecting and building a private "Neo-cloud" from the ground up. You will design the control planes that manage high-performance compute clusters, orchestrate thousands of GPUs, and optimize the hardware-software interface for massive AI workloads.
This role sits at the intersection of High-Performance Computing (HPC), Kubernetes Internals, and Bare Metal Engineering.
What You Will Be Doing
- Building the Control Plane: Designing and developing the software layer (APIs, Controllers, Agents) that automates the lifecycle of bare-metal AI infrastructure.
- Orchestrating High-Scale Compute: Architecting scheduling solutions for large-scale distributed training jobs across massive clusters of GPUs (NVIDIA H200/B200/B300), ensuring efficient bin-packing and gang scheduling.
- Optimizing the Fabric: Tuning the software-defined networking layer to support low-latency interconnects (InfiniBand/RDMA/RoCEv2) essential for multi-node training.
- Developing Kubernetes Extensions: Writing custom Kubernetes Operators and CRDs to abstract complex hardware realities (topology awareness, GPU partitioning) into usable interfaces for our Data Scientists.
- Hardware-Level Debugging: Investigating and resolving deep systems issues, ranging from PCIe bus errors and NCCL communication timeouts to kernel panics on bare-metal nodes.
- Defining Standards: Creating the "Golden Image" for AI workloads, managing drivers, firmware, and OS optimizations to squeeze maximum performance out of the hardware.
- Systems Programming Expertise: 10+ years of software engineering experience with strong proficiency in Go (Golang), C++, or Rust. You must be comfortable building system agents, APIs, and CLI tools.
- Deep Kubernetes Knowledge: You understand K8s internals beyond simple deployment. Experience with Custom Resource Definitions (CRDs), Operators, and the Kubernetes API server architecture.
- GPU Ecosystem Experience: Hands-on experience managing NVIDIA GPU clusters. Familiarity with NVIDIA drivers, CUDA toolkit, and the container runtime (NVIDIA Container Toolkit).
- Linux Internals: Deep understanding of the Linux kernel, cgroups, namespaces, and system performance tuning.
- Infrastructure as Code: Mastery of declarative infrastructure tools (Terraform, Ansible) but with a focus on provisioning physical hardware rather than just cloud VMs.
- Problem Solving: A proven track record of debugging complex distributed systems where the root cause could be code, network, or silicon.
- HPC Background: Experience working with traditional supercomputing schedulers (Slurm, PBS) or modern batch schedulers (Volcano, Kueue, Ray).
- Bare Metal Provisioning: Experience with tools like Cluster API (CAPI), Metal3, Tinkerbell, Canonical MaaS, or OpenStack Ironic.
- High-Speed Networking: Knowledge of RDMA, InfiniBand, GPUDirect, and how to expose these technologies to containerized workloads.
- AI/ML Familiarity: Understanding of how distributed training works (e.g., PyTorch Distributed, Megatron-LM, DeepSpeed) and the infrastructure requirements of Large Language Models (LLMs).
- Observability: Experience building monitoring for hardware health (DCGM) and distributed tracing for long-running jobs.
Perks & Benefits
- Indefinite contract.
- Equal pay guaranteed.
- Variable performance bonus.
- Signing bonus.
- Relocation package (if applicable).
- Private health insurance.
- Eligibility for educational budget according to internal policy.
- Hybrid opportunity.
- Flexible working hours.
- Working in a high paced environment, working on cutting edge technologies.
- Career plan. Opportunity to learn and teach.
- Progressive Company. Happy people culture
Come and join our multicultural team!
5 locations
+27 languages
Similar Jobs
Explore other opportunities that match your interests
Senior Systems Engineer - European Sovereign Cloud
••••••
••••••
••••••
Job Type
••••••
Experience Level
••••••
jobster
Germany
Senior Systems Engineer - European Sovereign Cloud
••••••
••••••
••••••
Job Type
••••••
Experience Level
••••••
Amazon Web Services (AWS)
Germany
Visa Sponsorship
Relocation
Remote
Job Type
Full-time
Experience Level
Entry level
Kaufland e-commerce
Germany