Senior Network Engineer - AI Infrastructure & High-Performance Computing

asobbi • United State

Relocation

This Job is No Longer Active This position is no longer accepting applications

AI Summary

Design and deploy high-performance datacenter fabrics for AI cluster deployments. Configure Arista or Mellanox switches with VXLAN/EVPN overlays. Implement network configurations for GPU cluster connectivity.

Key Highlights

High-performance datacenter fabrics design and deployment

Arista or Mellanox switch configuration with VXLAN/EVPN overlays

GPU cluster connectivity network configuration

Network automation using Ansible, Terraform, or Python scripting

Performance validation using iperf3, NCCL benchmarks, and monitoring with Prometheus/Grafana

Technical Skills Required

Ansible Terraform Python BGP OSPF VXLAN EVPN InfiniBand RoCEv2 RDMA GPU cluster connectivity HPC interconnects Linux Git Prometheus Grafana NetBox Device42

Benefits & Perks

Competitive base salary up to $200K

Full benefits package

Relocation assistance available

Professional development budget for certifications and training

Job Description

Network Engineer - AI Infrastructure & High-Performance Computing

Up to $250K Base + 2x basic in equity + bonus + Much more!

About the Role

My client is building the network backbone for next-generation AI and machine learning infrastructure. This isn't traditional enterprise networking - you'll be designing and deploying the high-speed network fabrics that connect GPU clusters for large-scale training workloads. If you've worked with HPC environments, GPU clusters, or high-performance datacenter fabrics, this role could be a great fit.

What You'll Be Building

High-performance datacenter fabrics - Spine-leaf topologies optimized for GPU-to-GPU communication using modern protocols
Physical network infrastructure - Rack-level designs, structured cabling, and acceptance testing in production datacenters
Lossless Ethernet networks - Configure congestion control mechanisms for zero packet loss in AI training workloads
Automated provisioning systems - Infrastructure as Code using Ansible, Terraform, and Python to manage network devices
Performance validation - Throughput testing with iperf3, NCCL benchmarks, and monitoring with Prometheus/Grafana

This Role Is For You If You've:

✓ Experience with High-Performance Workloads

Worked with GPU cluster networks, HPC environments, or high-throughput computing infrastructure
Exposure to InfiniBand (EDR/HDR/NDR) or RDMA over Ethernet (RoCEv2) - even in lab/testing environments
Some understanding of lossless Ethernet concepts - Priority Flow Control (PFC), Explicit Congestion Notification (ECN)
Interest in high-bandwidth, low-latency networking for compute-intensive applications

✓ Strong Datacenter Fabric Experience

Deployed spine-leaf architectures in production (doesn't need to be massive scale)
Configured VXLAN/EVPN with BGP routing for modern datacenters
Worked with Arista switches OR Mellanox/NVIDIA networking gear (or willingness to learn)
Built non-blocking fabrics or understand oversubscription for compute workloads

✓ Hands-On Infrastructure Skills

Experience with physical datacenter implementations - racking equipment, cable management, labeling
Comfortable with structured cabling - fiber optics, copper, DAC cables
Can perform acceptance testing - link validation, basic throughput checks
Willing to travel to datacenters as needed for hands-on work

✓ Network Automation Experience

Built configurations using Ansible, Terraform, or Python
Comfortable with command-line interfaces and scripting
Experience with version control (Git) for network configs
Familiarity with monitoring tools - Prometheus, Grafana, or similar platforms

Required Technical Skills

Core Networking (Must Have):

5+ years working with datacenter networks
Strong Layer 2/3 networking - BGP, OSPF, VXLAN, EVPN
Experience with spine-leaf topologies in production
Understanding of routing protocols and modern datacenter designs

High-Performance Networking (Preferred):

Experience with one or more of the following:
InfiniBand networks (any exposure counts)
RoCEv2 / RDMA networking
GPU cluster connectivity
HPC interconnects or high-bandwidth environments
Basic understanding of congestion control - ECN, PFC, jumbo frames
Interest in learning about lossless transport for RDMA workloads

Vendor Platforms (Experience with One or More):

Arista switches - 7000/8000 series preferred, but any Arista experience valuable
Mellanox/NVIDIA networking - Spectrum switches, ConnectX NICs (even basic exposure)
Cisco Nexus datacenter switches - 9K/7K series
Palo Alto firewalls - configuration and policy management

Automation & Tooling (Must Have at Least Two):

Infrastructure as Code - Ansible, Terraform, or Python scripting
Version control - Git for managing network configurations
Linux familiarity - comfortable in command-line environments
Monitoring tools - Prometheus, Grafana, SolarWinds, or similar
DCIM platforms - NetBox, Device42, or asset management tools

Nice to Have (Bonus Skills)

InfiniBand experience - any hands-on work with IB fabrics
RoCEv2 or RDMA - configuration or testing experience
GPU cluster networking - NVIDIA NVLink, GPUDirect
Performance testing tools - iperf3, NCCL tests, network benchmarking
Bare metal provisioning - PXE boot, Redfish/IPMI
Cloud networking - AWS, Azure hybrid connectivity
Multi-tenant environments - namespace isolation, traffic segmentation

Day-to-Day Responsibilities

Design and deploy spine-leaf fabrics for AI cluster deployments
Configure Arista or Mellanox switches with VXLAN/EVPN overlays
Implement network configurations for GPU cluster connectivity
Perform rack-level implementations - cable routing, labeling, testing
Validate network performance using standard testing tools
Automate network provisioning using Ansible/Terraform
Monitor network health and troubleshoot performance issues
Work with datacenter teams on infrastructure requirements
Implement firewall policies for network segmentation
Document network designs and maintain topology diagrams
Travel to datacenters as needed (up to 20% travel)

Compensation & Benefits

Base Salary: Competitive, up to $200K depending on experience
Full benefits package - health, dental, vision, 401(k)
Relocation assistance available if needed
Professional development budget for certifications and training
Opportunity to work with cutting-edge AI infrastructure

What You'll Get

✓ Modern technology - Work with 400G networking, GPU architectures, automation tools

✓ High-impact work - Networks you build enable AI research and breakthroughs

✓ Greenfield deployments - Design networks without legacy constraints

✓ Technical focus - Engineering-first culture, minimal bureaucracy

✓ Career growth - Become an expert in AI infrastructure networking

✓ Learning opportunities - Hands-on with InfiniBand, RDMA, GPU networking technologies

How to Apply

If you have datacenter networking experience and interest in HPC/GPU/high-performance computing, I'd love to hear from you.

Job Overview

Posted Date Dec 01, 2025

Employment Type Full-time

Experience Level Mid-Senior level

Location United State

Category Networking

Company asobbi

Mentioned Skills

Industries

Similar Jobs

Explore other opportunities that match your interests

IT Systems Support Engineer

Networking

•

22m ago

Visa Sponsorship Relocation Remote

Job Type Contract

Experience Level Mid-Senior level

nucleus radiopharma

United State

IT Specialist - Enterprise Infrastructure Support

Networking

•

5h ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Associate

CareerXperts Consulting

United State

Senior Network Administrator

Networking

•

10h ago

Premium Job

•••••• •••••• ••••••

Job Type ••••••

Experience Level ••••••

Raytheon

United State

Senior Network Engineer - AI Infrastructure & High-Performance Computing

Key Highlights

Technical Skills Required

Benefits & Perks

Job Description

Job Overview

Mentioned Skills

Industries

Similar Jobs

IT Systems Support Engineer

nucleus radiopharma

IT Specialist - Enterprise Infrastructure Support

CareerXperts Consulting

Senior Network Administrator

Premium Job

Raytheon

Subscribe our newsletter