Datacenter Infrastructure Reliability Lead

Doghouse Recruitment • Netherlands

Relocation

This Job is No Longer Active This position is no longer accepting applications

AI Summary

Lead the highest escalation layer for critical infrastructure incidents across global datacenters. Build and lead the L3 support team across regions. Design and enforce incident response and escalation frameworks.

Key Highlights

Lead L3 support team

Design incident response frameworks

Act as Incident Commander

Key Responsibilities

Build, lead, and scale the L3 support team

Design and enforce incident response and escalation frameworks

Act as Incident Commander for high-severity production incidents

Technical Skills Required

Linux Server hardware Firmware (BIOS/BMC) GPU server platforms Nvidia-smi Dcgmi Linux log correlation

Benefits & Perks

Up to 200k base

25% bonus

RSUs

Relocation package provided

Hybrid work arrangement (50/50 in office)

Nice to Have

Deep troubleshooting capability across Linux, server hardware, and firmware (BIOS/BMC)

Strong familiarity with GPU server platforms and common diagnostics

Job Description

Datacenter Infrastructure Reliability Lead

Location: Amsterdam, Netherlands – Hybrid (50/50 in office)

Relocation: possible and supported.

Compensation: Up to 200k base + 25% bonus + RSUs

Join a fast-growing AI infrastructure company building large-scale GPU and datacenter platforms from the ground up. This role is ideal for experienced infrastructure leaders who enjoy solving complex production issues, building teams from scratch, and operating at the intersection of hardware, Linux systems, and large-scale datacenter operations.

You will lead the highest escalation layer for critical infrastructure incidents across global datacenters.

Looking to advance your Development & Programming career with relocation support? Explore Development & Programming Jobs with Relocation Packages that include comprehensive packages to help you move and settle in your new role.

Role Overview

Your team will be the final escalation point for anything related to datacenter IT hardware infrastructure (modern servers, GPUs, racks, networking, storage, etc.). Anything L2 cannot resolve will be escalated to this team.

This L3 team is not yet in place — you will be responsible for building and leading it from scratch.

Responsibilities

Build, lead, and scale the L3 support team across regions, with full ownership of hiring, team structure, and performance
Design and enforce the end-to-end incident response and escalation framework, including workflows, ownership models, KPIs, and ensuring adoption across multiple teams

Discover our full range of relocation jobs with comprehensive support packages to help you relocate and settle in your new location.

Act as Incident Commander for high-severity production incidents, driving structured mitigation, clear communication, and long-term resolution
Own problem management and continuous improvement, identifying recurring failure patterns and translating them into scalable fixes across infrastructure and operations

What We’re Looking For

Minimum of 10+ years of experience in large-scale datacenter environments
3+ years of experience leading highly technical teams

Interested in relocating to Netherlands? Check out our comprehensive Relocation Jobs in Netherlands page with detailed relocation packages and benefits.

3+ years of experience building teams (hiring and performance management)
Experience setting up frameworks, processes, and workflows from scratch

Nice to have:

Deep troubleshooting capability across Linux, server hardware, and firmware (BIOS/BMC), with the ability to guide investigations at a systems engineer level
Strong familiarity with GPU server platforms and common diagnostics (e.g. nvidia-smi, dcgmi, Linux log correlation)

Job Overview

Posted Date Apr 03, 2026

Employment Type Full-time

Experience Level Director

Location Netherlands

Annual Salary 200,000 USD

Category Programming

Company Doghouse Recruitment

Mentioned Skills

Industries

Similar Jobs

Explore other opportunities that match your interests

Telecommunication System Engineer

Programming

•

2h ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Associate

Telespazio

Netherlands

Software & Simulation Integration Support End-to-End System Lab Engineer

Programming

•

13h ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Associate

Telespazio

Netherlands

Senior Microelectronics Engineer - Space Operations

Programming

•

22h ago

Premium Job

•••••• •••••• ••••••

Job Type ••••••

Experience Level ••••••

Telespazio

Netherlands

Datacenter Infrastructure Reliability Lead

Key Highlights

Key Responsibilities

Technical Skills Required

Benefits & Perks

Nice to Have

Job Description

Job Overview

Mentioned Skills

Industries

Similar Jobs

Telecommunication System Engineer

Telespazio

Software & Simulation Integration Support End-to-End System Lab Engineer

Telespazio

Senior Microelectronics Engineer - Space Operations

Premium Job

Telespazio

Subscribe our newsletter