Site Reliability Engineer (Kubernetes - HPC)

IREN United State
Relocation
Apply
AI Summary

Provide Tier 2 operational support for the IREN global fleet as part of a 24x7 365 incident response team. Ensure timely resolution of site and customer impacting events. Collaborate with product operations to ensure accurate monitoring and response for our global fleet.

Key Highlights
HPC system architecture
Kubernetes integration
Slurm workload manager
Key Responsibilities
Response, triage, and resolution of operational incidents
Support the deployment and maintenance of HPC clusters
Collaborate with product operations to ensure accurate monitoring and response for our global fleet
Technical Skills Required
Kubernetes Slurm workload manager HPC system architecture
Benefits & Perks
100% company paid health insurance premiums
401(k) retirement plan with company match
Paid Time Off (PTO) and paid holidays

Job Description


Job Description

Job Type:  Full-time | Location: Fort-Worth | Department: IT | Reporting to: Senior Manager, Technical Operations Center | Work Location Type: #hybrid 

IREN is a leading AI Cloud Service Provider, delivering large-scale GPU clusters for AI training and inference.  IREN’s vertically integrated platform is underpinned by its expansive portfolio of grid-connected land and data centers in renewable-rich regions across the U.S. and Canada.  

The Site Reliability Engineer (Kubernetes - HPC)  will provide Tier 2 operational support for the IREN global fleet as part of a 24x7 365 incident response team. They will ensure the timely resolution of site and customer impacting events, engaging vendor and product tier 3 support, when appropriate. They are also responsible for the ongoing improvement and refinement of our monitoring and response alerting, ensuring that we are able to provide immediate support for all possible events. 

With 100% renewable energy, we build, own and operate our data centers and take pride in being at the forefront of sustainable solutions for the ever-evolving applications of high-performance compute. We believe that human progress is invaluable, but it should be done in the right way – responsibly, sustainably and having a positive impact on the communities we operate in.

Job Requirements

  • Minimum of 3 - 5 years of experience in HPC system architecture with proven expertise in designing, deploying, and managing HPC clusters.
  • Extensive knowledge of Kubernetes, with a focus on its integration within HPC environments.
  • Hands-on experience with the Slurm workload manager, or similar.
  • Familiarity with HPC management tools and software, ensuring efficient system monitoring and troubleshooting.
  • Proven track record of resolving complex system challenges and enhancing operational performance.
  • Understanding of cloud platforms and their integration into HPC ecosystems.
  • Deep knowledge of network and storage solutions commonly used in HPC setups.
  • A degree or diploma in computer science, engineering, or a combination of education and experience appropriate to the role.
  • Relevant certifications in Kubernetes, HPC technologies, or system architecture are advantageous.

Job Responsibilities

  • Response, triage, and resolution of operational incidents as part of a 24x7 365 response team; Supporting escalations to Tier 3 product operations, when appropriate.
  • Support the deployment and maintenance of HPC clusters, ensuring they operate effectively and maximize availability
  • Manage HPC software components such as Kubernetes, Slurm, cluster management software, and any infrastructure required to operate the HPC environment
  • Collaborate with product operations to ensure accurate monitoring and response for our global fleet.
  • Draft comprehensive documentation, including operational procedures, and best practice guidelines.
  • Provide technical leadership and training to other team members, fostering an environment of continuous learning and improvement.

Job Benefits

At IREN, we offer a highly competitive compensation package that includes base salary, annual performance incentives, and opportunities to build long-term wealth through equity programs. These offerings are part of our broader total rewards package, thoughtfully designed to support your health, well-being, and long-term success. 

Compensation 

  • Total Compensation package may be inclusive of annual incentive bonus, equity (long-term incentive).
  • Relocation or Living-out-allowance / per diem (as appliable and based on successful candidate circumstances)

Health & Wellness 

  • 100% company paid  health insurance premiums(medical, dental, and vision)for employees, 75% company paid coverage for dependents
  • Company-paid short-term and long-term disability insurance
  • Voluntary life, critical illness, and accident coverage available
  • Health Savings Accounts (HSA) – when combined with the High Deductible Health Plan
  • Employee Assistance Program and wellness resources 

Retirement & Financial Wealth 

  • 401(k) retirement plan with company match
  • Access to financial planning and legal services 

Time Off & Leave Programs 

  • Paid Time Off (PTO) and paid holidays

Growth & Development 

  • Internal skills training and advancement pathways
  • Professional development to support certifications, continuing education, or role related training

Community & Culture 

  • Company events and team-building activities

We value diverse perspectives and believe that skills can be developed. If you’re passionate about this role, we want to hear from you — whether you meet every criteria or not. Your unique experiences might be exactly what we need!   

IE US Operations Inc., the employing entity and proud member of the IREN group is an equal opportunity employer that is committed to creating an inclusive workplace. We are committed to evaluating qualified applicants and do not discriminate against protected characteristics under applicable legislation. 

IE US Operations, Inc. “IREN” participates in E-Verify and will provide the federal government with your Form I-9 information to confirm that you are authorized to work in the United States. 

By applying for this position and submitting your resume and application materials, you consent to the processing of your personal information in accordance with our Job Applicant Privacy Statement available on our website at www.iren.com.  


Similar Jobs

Explore other opportunities that match your interests

Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Mid-Senior level

Harnham

United State

Systems Engineer - Robotics Delivery & Packaging Innovations

Devops
6h ago

Premium Job

Sign up is free! Login or Sign up to view full details.

•••••• •••••• ••••••
Job Type ••••••
Experience Level ••••••

Amazon

United State
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Mid-Senior level

VRK IT Vision Inc.

United State

Subscribe our newsletter

New Things Will Always Update Regularly