Network Engineer - Datacenter Operations

Realm United State
Remote
Apply
AI Summary

Highly experienced Network Engineer sought for a challenging role in datacenter operations. Key responsibilities include owning network operations for a datacenter region, responding to complex incidents, and coordinating repair and recovery. The successful candidate will have strong production ops experience, hands-on expertise in EVPN/VXLAN, and excellent troubleshooting skills.

Key Highlights
High-Growth AI Infrastructure
Network Operations and Repair
Datacenter Region Ownership
Technical Skills Required
EVPN VXLAN BGP CLOS high-radix switching Python SQL-backed dashboards Grafana Tableau .NET
Benefits & Perks
Salary $150,000 - $250,000
Meaningful equity
Generous PTO policy
Remote flexibility
In-office presence encouraged

Job Description


🛜 Network Engineer - Datacenter Operations

🤖 High-Growth AI Infrastructure

🇺🇸 United States - 30% Travel

💵 $150,000 - $250,000 + Equity + Benefits


Description:

A rapidly scaling AI-infrastructure company, backing many of the world’s leading research labs and next-generation AI builders, is seeking a Network Engineer focused on Operations and Repair.


They’re building colossal GPU clusters in the US - think 100k+ GPUs, liquid cooling, multi-GW power draw. This is the infrastructure that literally determines how fast the future gets built.

This role is for an experienced network operations engineer who wants true ownership. You’ll be the primary operator for a datacenter region, responsible for keeping large-scale network fabrics healthy, responding to complex incidents, and coordinating repair and recovery when things go wrong.


This is not a NOC role and not a design-only position. You’ll work closely with centralized monitoring teams, deployment engineers, and onsite operations to ensure production networks stay available and performant.


What you’ll do

  • Own network operations for an assigned datacenter region, supporting datacenter deployments, turn-ups, and expansions
  • Act as Tier 2/3 escalation point for network incidents
  • Troubleshoot complex L1–L3 and fabric-level issues
  • Coordinate network break-fix with onsite teams and vendors
  • Manage RMAs and vendor escalations
  • Build and maintain regional/network observability dashboards
  • Validate production readiness and operational handover


Requirements:

  • 4+ years of network engineering with heavy production ops exposure
  • Proven experience running and troubleshooting live datacenter networks
  • Strong incident response and outage leadership experience
  • Hands-on with EVPN/VXLAN, BGP, CLOS, high-radix switching
  • Confident in troubleshooting L2/L3, routing, fabric, and physical faults
  • Experience with SQL-backed dashboards (Grafana, Tableau, similar)
  • Working knowledge of Python for ops, analysis, or scripting
  • Pragmatic operator: prioritizes impact, documents as they go
  • Comfortable with ~30–40% travel


Nice to have

  • AI/ML or HPC network operations (RDMA, RoCEv2, lossless Ethernet)
  • Previous site, campus, or regional ops ownership
  • Hands-on hardware break-fix and RMA coordination
  • Experience with network monitoring, alerting, and telemetry
  • Follow-the-sun or globally distributed ops experience


Compensation:

  • $150k–$260k + meaningful equity
  • Generous PTO policy
  • Remote flexibility available, though in-office presence is encouraged.


Similar Jobs

Explore other opportunities that match your interests

Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Internship

DLB Associates

United State

Senior Director of Technical Operations

Networking
4h ago

Premium Job

Sign up is free! Login or Sign up to view full details.

•••••• •••••• ••••••
Job Type ••••••
Experience Level ••••••

Keeper Security, Inc.

United State

IT Architect (Remote)

Networking
8h ago
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Mid-Senior level

Lensa

United State

Subscribe our newsletter

New Things Will Always Update Regularly