Senior DevOps Engineer for AI Infrastructure

andromeda • United State

Remote

Apply

AI Summary

We are expanding to new frontiers to find the brightest that work in AI infrastructure, research and engineering. Provision, configure, and operate Kubernetes-based clusters for customers across multiple providers. Build automation and tooling to streamline cluster deployments and integrations.

Key Highlights

Provision and operate Kubernetes-based clusters

Build automation and tooling

Debug customer issues

Key Responsibilities

Provision, configure, and operate Kubernetes-based clusters for customers

Build automation and tooling to streamline cluster deployments and integrations

Debug customer issues across networking, storage, scheduling, and system layers

Technical Skills Required

Linux systems Kubernetes Container orchestration Infrastructure-as-Code (Terraform, Helm, Ansible) Python Go Bash Prometheus Grafana Loki Datadog

Benefits & Perks

Remote work

Full-time employment

Nice to Have

Exposure to ML/AI infrastructure or GPU-based systems

Familiarity with high-performance networking or distributed storage

Job Description

Location: Global Remote / San Francisco

Full-Time

About Andromeda

Andromeda Cluster was founded by Nat Friedman and Daniel Gross to give early-stage startups access to the kind of scaled AI infrastructure once reserved only for hyperscalers.

We began with a single managed cluster — but it filled almost instantly. Since then, we’ve been quietly building the systems, network, and orchestration layer that makes the world’s AI infrastructure more accessible.

Today, Andromeda works with leading AI labs, data centers, and cloud providers to deliver compute when and where it’s needed most. Our platform routes training and inference jobs across global supply, unlocking flexibility and efficiency in one of the fastest-growing markets on earth.

Our long-term vision is to build the liquidity layer for global AI compute — a marketplace that moves the infrastructure and workloads powering AGI not dissimilar to the flows of capital in the world's financial markets.

We are expanding to new frontiers to find the brightest that work in AI infrastructure, research and engineering.

What You’ll Do

Provision, configure, and operate Kubernetes-based clusters for customers across multiple providers.
Build automation and tooling to streamline cluster deployments and integrations.

Interested in remote work opportunities in Devops? Discover Devops Remote Jobs featuring exclusive positions from top companies that offer flexible work arrangements.

Debug customer issues across networking, storage, scheduling, and system layers.
Improve reliability and scalability of both training and inference infrastructure.
Design and implement monitoring, alerting, and observability for critical systems.
Collaborate with engineering and product teams to plan and deliver infrastructure for new services.
Participate in on-call and incident response, leading postmortems and reliability improvements.

What We’re Looking For

5+ years experience in SRE, DevOps, or infrastructure engineering roles.
Strong Linux systems and networking fundamentals.
Deep experience with Kubernetes and container orchestration at scale.
Proficiency with Infrastructure-as-Code (Terraform, Helm, Ansible, etc.).

Browse our curated collection of remote jobs across all categories and industries, featuring positions from top companies worldwide.

Strong automation and scripting skills (Python, Go, or Bash).
Experience with observability stacks (Prometheus, Grafana, Loki, Datadog, etc.).
Track record of operating production systems and leading incident response.

Nice to Have

Exposure to ML/AI infrastructure or GPU-based systems (CUDA, Slurm, Triton, etc.).
Familiarity with high-performance networking (InfiniBand, NVLink) or distributed storage (VAST, Weka, Ceph).
Customer-facing support or consulting experience.

Why You’ll Love It Here

This is a builder’s role. You’ll have ownership and autonomy to shape how our systems run, working directly with customers and providers while building the foundation for reliable, scalable AI infrastructure.

Job Overview

Posted Date Mar 03, 2026

Employment Type Full-time

Experience Level Mid-Senior level

Location United State

Category Devops

Company andromeda

Mentioned Skills

Industries

Similar Jobs

Explore other opportunities that match your interests

Director, Professional Services

Devops

•

2h ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Director

mission, a cdw company

United State

Platform Engineer, Compute Infrastructure

Devops

•

12h ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Mid-Senior level

missing-link.io

United State

Principal Infrastructure Engineer

Devops

•

21h ago

Premium Job

•••••• •••••• ••••••

Job Type ••••••

Experience Level ••••••

Voxel51

United State

Senior DevOps Engineer for AI Infrastructure

Key Highlights

Key Responsibilities

Technical Skills Required

Benefits & Perks

Nice to Have

Job Description

Job Overview

Mentioned Skills

Industries

Similar Jobs

Director, Professional Services

mission, a cdw company

Platform Engineer, Compute Infrastructure

missing-link.io

Principal Infrastructure Engineer

Premium Job

Voxel51

Subscribe our newsletter