Highly experienced Network Engineer sought for a challenging role in datacenter operations. Key responsibilities include owning network operations for a datacenter region, responding to complex incidents, and coordinating repair and recovery. The successful candidate will have strong production ops experience, hands-on expertise in EVPN/VXLAN, and excellent troubleshooting skills.
Key Highlights
Technical Skills Required
Benefits & Perks
Job Description
🛜 Network Engineer - Datacenter Operations
🤖 High-Growth AI Infrastructure
🇺🇸 United States - 30% Travel
💵 $150,000 - $250,000 + Equity + Benefits
Description:
A rapidly scaling AI-infrastructure company, backing many of the world’s leading research labs and next-generation AI builders, is seeking a Network Engineer focused on Operations and Repair.
They’re building colossal GPU clusters in the US - think 100k+ GPUs, liquid cooling, multi-GW power draw. This is the infrastructure that literally determines how fast the future gets built.
This role is for an experienced network operations engineer who wants true ownership. You’ll be the primary operator for a datacenter region, responsible for keeping large-scale network fabrics healthy, responding to complex incidents, and coordinating repair and recovery when things go wrong.
This is not a NOC role and not a design-only position. You’ll work closely with centralized monitoring teams, deployment engineers, and onsite operations to ensure production networks stay available and performant.
What you’ll do
- Own network operations for an assigned datacenter region, supporting datacenter deployments, turn-ups, and expansions
- Act as Tier 2/3 escalation point for network incidents
- Troubleshoot complex L1–L3 and fabric-level issues
- Coordinate network break-fix with onsite teams and vendors
- Manage RMAs and vendor escalations
- Build and maintain regional/network observability dashboards
- Validate production readiness and operational handover
Interested in remote work opportunities in IT & Network Engineering? Discover IT & Network Engineering Remote Jobs featuring exclusive positions from top companies that offer flexible work arrangements.
Requirements:
- 4+ years of network engineering with heavy production ops exposure
- Proven experience running and troubleshooting live datacenter networks
- Strong incident response and outage leadership experience
- Hands-on with EVPN/VXLAN, BGP, CLOS, high-radix switching
- Confident in troubleshooting L2/L3, routing, fabric, and physical faults
- Experience with SQL-backed dashboards (Grafana, Tableau, similar)
- Working knowledge of Python for ops, analysis, or scripting
- Pragmatic operator: prioritizes impact, documents as they go
- Comfortable with ~30–40% travel
Browse our curated collection of remote jobs across all categories and industries, featuring positions from top companies worldwide.
Nice to have
- AI/ML or HPC network operations (RDMA, RoCEv2, lossless Ethernet)
- Previous site, campus, or regional ops ownership
- Hands-on hardware break-fix and RMA coordination
- Experience with network monitoring, alerting, and telemetry
- Follow-the-sun or globally distributed ops experience
Compensation:
- $150k–$260k + meaningful equity
- Generous PTO policy
- Remote flexibility available, though in-office presence is encouraged.
Similar Jobs
Explore other opportunities that match your interests
DLB Associates
Senior Director of Technical Operations
Keeper Security, Inc.