IOC Systems Specialist - HPC & AI Infrastructure Operations

Optomi • United State

Relocation

Apply

AI Summary

Monitor and troubleshoot high-performance compute and AI cluster environments in a 24×7 operations center. Ensure reliability of distributed HPC systems and enterprise storage platforms. Support Kubernetes and Slurm-based compute environments with Grafana and Jira incident management.

Key Highlights

24×7 operations center support for AI/HPC infrastructure

Tier 2 IOC/NOC monitoring and troubleshooting

Enterprise storage systems (WEKA, VAST, Dell PowerScale)

Kubernetes and Slurm orchestration environments

Key Responsibilities

Monitor and troubleshoot HPC and AI cluster environments in a Tier 2 IOC/NOC setting

Support and troubleshoot enterprise storage systems (WEKA, VAST, Dell Isilon/PowerScale, or similar SAN/NAS)

Investigate performance, connectivity, and network-related storage issues (including VLAN and configuration validation)

Technical Skills Required

HPC infrastructure Storage systems Networking (VLANs) Kubernetes

Benefits & Perks

relocation assistance available

Nice to Have

Slurm-based compute environments

Grafana monitoring tools

Jira ticketing systems

Job Description

Optomi, in partnership with our client, are seeking an IOC Systems Specialist to support a large-scale AI/HPC infrastructure environment focused on high-performance compute and data-intensive workloads.

This role sits in a 24×7 operations center and is responsible for monitoring, troubleshooting, and ensuring reliability of distributed HPC systems and enterprise storage platforms.

Onsite Fort Worth, TX - relocation assistance available!
Direct Hire
On-call rotation

Looking to advance your IT & Network Engineering career with relocation support? Explore IT & Network Engineering Jobs with Relocation Packages that include comprehensive packages to help you move and settle in your new role.

What you’ll do:

Monitor and troubleshoot HPC and AI cluster environments in a Tier 2 IOC/NOC setting
Support and troubleshoot enterprise storage systems (WEKA, VAST, Dell Isilon/PowerScale, or similar SAN/NAS)
Investigate performance, connectivity, and network-related storage issues (including VLAN and configuration validation)
Work with Kubernetes and Slurm-based compute environments

Discover our full range of relocation jobs with comprehensive support packages to help you relocate and settle in your new location.

Use monitoring tools (Grafana) and ticketing systems (Jira) for incident management
Perform root cause analysis and collaborate with engineering teams for resolution
Ensure system health, uptime, and performance across distributed infrastructure

What we’re looking for:

Interested in relocating to United State? Check out our comprehensive Relocation Jobs in United State page with detailed relocation packages and benefits.

Experience supporting enterprise storage or data center storage environments
Strong troubleshooting skills across storage, network, and compute systems
Familiarity with HPC or high-throughput infrastructure environments
Understanding of networking concepts (VLANs, connectivity, throughput)
Experience in operational support environments (IOC/NOC or Tier 2 support)
Exposure to Kubernetes, Slurm, or similar orchestration/workload tools is a plus
Join a cutting-edge AI infrastructure company building sustainable, large-scale GPU compute environments powering next-generation workloads.

Job Overview

Posted Date Jul 02, 2026

Employment Type Full-time

Experience Level Mid-Senior level

Location United State

Category Networking

Company Optomi

Mentioned Skills

Similar Jobs

Explore other opportunities that match your interests

IT Systems Administrator

Networking

•

56m ago

Premium Job

•••••• •••••• ••••••

Job Type ••••••

Experience Level ••••••

Raytheon

United State

Construction Quality Control Manager

Networking

•

1h ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Not Applicable

RQ Construction, LLC

United State

Network and Voice Administrator II

Networking

•

2h ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Not Applicable

State of Colorado

United State

IOC Systems Specialist - HPC & AI Infrastructure Operations

Key Highlights

Key Responsibilities

Technical Skills Required

Benefits & Perks

Nice to Have

Job Description

Job Overview

Mentioned Skills

Industries

Similar Jobs

IT Systems Administrator

Premium Job

Raytheon

Construction Quality Control Manager

RQ Construction, LLC

Network and Voice Administrator II

State of Colorado

Subscribe our newsletter