Senior Site Reliability Engineer

Doghouse Recruitment United State
Remote Visa Sponsorship
This Job is No Longer Active This position is no longer accepting applications
AI Summary

We're building a cloud platform for high-throughput, compute-heavy workloads. As a Senior SRE, you'll own production reliability end-to-end, define SLIs/SLOs, run error budget conversations, and ship changes that reduce incidents and improve latency.

Key Highlights
Define SLIs/SLOs
Run error budget conversations
Ship changes to reduce incidents and improve latency
Technical Skills Required
Linux Kubernetes Terraform Docker Helm Go Python C++
Benefits & Perks
Up to $225k base salary
Additional bonus
Stock options
Full remote work in the US
Resident permit required

Job Description


We’re building a cloud platform for high-throughput, compute-heavy workloads. We operate large-scale infrastructure where failure modes are real, capacity is finite, and reliability needs to be engineered, not “handled”.


As a Senior SRE, you’ll own production reliability end-to-end: define SLIs/SLOs, run error budget conversations, and ship changes that reduce incidents and improve latency (p95/p99). You’ll build automation to kill toil, raise deployment safety (canary/rollback), and turn observability into signal instead of noise.


This is a bare-metal environment, think Linux, data centers, physical fleets, and real hardware constraints, not managed services. You’ll work close to the metal across Kubernetes internals (scheduling, autoscaling behavior, kubelet pressure/evictions, etcd/control plane), Linux performance (CPU/memory/IO contention), and network debugging (DNS/TCP/TLS, packet loss, congestion). On-call is part of the job, but success is measured by how much you reduce it.


Requirements


  • Senior-level experience in Site Reliability Engineering / Production Engineering running bare metal / on-prem / data center infrastructure (not public cloud only)
  • Deep hands-on expertise in Linux systems debugging and performance (CPU, memory, IO, kernel-level behaviors)
  • Strong understanding of networking (DNS/TCP/TLS, latency, packet loss, congestion, troubleshooting under load)
  • Strong Kubernetes experience beyond manifests: scheduler behavior, autoscaling edge cases, kubelet pressure/evictions, etcd/control plane
  • Experience with Terraform, Docker, Helm, and modern CI/CD practices
  • Some coding skills in Go and/or Python and/or C++


Are you looking for complexity and a new place to nerd-out on optimisation of infrastructure, please apply.


BASE SALARY: up to 225k, additional bonus and stock

FULL REMOTE IN USA

Resident permit required.


Similar Jobs

Explore other opportunities that match your interests

Visa Sponsorship Relocation Remote
Job Type Contract
Experience Level Mid-Senior level

elios

United State

Linux System Administrator

Devops
22h ago
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Mid-Senior level

Jobs via Dice

United State
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Mid-Senior level

Jobs via Dice

United State

Subscribe our newsletter

New Things Will Always Update Regularly