Design and deploy high-performance datacenter fabrics for AI cluster deployments. Configure Arista or Mellanox switches with VXLAN/EVPN overlays. Implement network configurations for GPU cluster connectivity.
Key Highlights
Technical Skills Required
Benefits & Perks
Job Description
Network Engineer - AI Infrastructure & High-Performance Computing
Up to $250K Base + 2x basic in equity + bonus + Much more!
About the Role
My client is building the network backbone for next-generation AI and machine learning infrastructure. This isn't traditional enterprise networking - you'll be designing and deploying the high-speed network fabrics that connect GPU clusters for large-scale training workloads. If you've worked with HPC environments, GPU clusters, or high-performance datacenter fabrics, this role could be a great fit.
What You'll Be Building
- High-performance datacenter fabrics - Spine-leaf topologies optimized for GPU-to-GPU communication using modern protocols
- Physical network infrastructure - Rack-level designs, structured cabling, and acceptance testing in production datacenters
- Lossless Ethernet networks - Configure congestion control mechanisms for zero packet loss in AI training workloads
- Automated provisioning systems - Infrastructure as Code using Ansible, Terraform, and Python to manage network devices
- Performance validation - Throughput testing with iperf3, NCCL benchmarks, and monitoring with Prometheus/Grafana
This Role Is For You If You've:
✓ Experience with High-Performance Workloads
- Worked with GPU cluster networks, HPC environments, or high-throughput computing infrastructure
- Exposure to InfiniBand (EDR/HDR/NDR) or RDMA over Ethernet (RoCEv2) - even in lab/testing environments
- Some understanding of lossless Ethernet concepts - Priority Flow Control (PFC), Explicit Congestion Notification (ECN)
- Interest in high-bandwidth, low-latency networking for compute-intensive applications
✓ Strong Datacenter Fabric Experience
- Deployed spine-leaf architectures in production (doesn't need to be massive scale)
- Configured VXLAN/EVPN with BGP routing for modern datacenters
- Worked with Arista switches OR Mellanox/NVIDIA networking gear (or willingness to learn)
- Built non-blocking fabrics or understand oversubscription for compute workloads
✓ Hands-On Infrastructure Skills
- Experience with physical datacenter implementations - racking equipment, cable management, labeling
- Comfortable with structured cabling - fiber optics, copper, DAC cables
- Can perform acceptance testing - link validation, basic throughput checks
- Willing to travel to datacenters as needed for hands-on work
✓ Network Automation Experience
- Built configurations using Ansible, Terraform, or Python
- Comfortable with command-line interfaces and scripting
- Experience with version control (Git) for network configs
- Familiarity with monitoring tools - Prometheus, Grafana, or similar platforms
Required Technical Skills
Core Networking (Must Have):
- 5+ years working with datacenter networks
- Strong Layer 2/3 networking - BGP, OSPF, VXLAN, EVPN
- Experience with spine-leaf topologies in production
- Understanding of routing protocols and modern datacenter designs
High-Performance Networking (Preferred):
- Experience with one or more of the following:
- InfiniBand networks (any exposure counts)
- RoCEv2 / RDMA networking
- GPU cluster connectivity
- HPC interconnects or high-bandwidth environments
- Basic understanding of congestion control - ECN, PFC, jumbo frames
- Interest in learning about lossless transport for RDMA workloads
Vendor Platforms (Experience with One or More):
- Arista switches - 7000/8000 series preferred, but any Arista experience valuable
- Mellanox/NVIDIA networking - Spectrum switches, ConnectX NICs (even basic exposure)
- Cisco Nexus datacenter switches - 9K/7K series
- Palo Alto firewalls - configuration and policy management
Automation & Tooling (Must Have at Least Two):
- Infrastructure as Code - Ansible, Terraform, or Python scripting
- Version control - Git for managing network configurations
- Linux familiarity - comfortable in command-line environments
- Monitoring tools - Prometheus, Grafana, SolarWinds, or similar
- DCIM platforms - NetBox, Device42, or asset management tools
Nice to Have (Bonus Skills)
- InfiniBand experience - any hands-on work with IB fabrics
- RoCEv2 or RDMA - configuration or testing experience
- GPU cluster networking - NVIDIA NVLink, GPUDirect
- Performance testing tools - iperf3, NCCL tests, network benchmarking
- Bare metal provisioning - PXE boot, Redfish/IPMI
- Cloud networking - AWS, Azure hybrid connectivity
- Multi-tenant environments - namespace isolation, traffic segmentation
Day-to-Day Responsibilities
- Design and deploy spine-leaf fabrics for AI cluster deployments
- Configure Arista or Mellanox switches with VXLAN/EVPN overlays
- Implement network configurations for GPU cluster connectivity
- Perform rack-level implementations - cable routing, labeling, testing
- Validate network performance using standard testing tools
- Automate network provisioning using Ansible/Terraform
- Monitor network health and troubleshoot performance issues
- Work with datacenter teams on infrastructure requirements
- Implement firewall policies for network segmentation
- Document network designs and maintain topology diagrams
- Travel to datacenters as needed (up to 20% travel)
Compensation & Benefits
- Base Salary: Competitive, up to $200K depending on experience
- Full benefits package - health, dental, vision, 401(k)
- Relocation assistance available if needed
- Professional development budget for certifications and training
- Opportunity to work with cutting-edge AI infrastructure
What You'll Get
✓ Modern technology - Work with 400G networking, GPU architectures, automation tools
✓ High-impact work - Networks you build enable AI research and breakthroughs
✓ Greenfield deployments - Design networks without legacy constraints
✓ Technical focus - Engineering-first culture, minimal bureaucracy
✓ Career growth - Become an expert in AI infrastructure networking
✓ Learning opportunities - Hands-on with InfiniBand, RDMA, GPU networking technologies
How to Apply
If you have datacenter networking experience and interest in HPC/GPU/high-performance computing, I'd love to hear from you.