Senior Site Reliability Engineer

Doghouse Recruitment United State
Remote Visa Sponsorship
Apply
AI Summary

We're building a cloud platform for high-throughput, compute-heavy workloads. As a Senior SRE, you'll own production reliability end-to-end, define SLIs/SLOs, run error budget conversations, and ship changes that reduce incidents and improve latency.

Key Highlights
Define SLIs/SLOs
Run error budget conversations
Ship changes to reduce incidents and improve latency
Technical Skills Required
Linux Kubernetes Terraform Docker Helm Go Python C++
Benefits & Perks
Up to $225k base salary
Additional bonus
Stock options
Full remote work in the US
Resident permit required

Job Description


We’re building a cloud platform for high-throughput, compute-heavy workloads. We operate large-scale infrastructure where failure modes are real, capacity is finite, and reliability needs to be engineered, not “handled”.


As a Senior SRE, you’ll own production reliability end-to-end: define SLIs/SLOs, run error budget conversations, and ship changes that reduce incidents and improve latency (p95/p99). You’ll build automation to kill toil, raise deployment safety (canary/rollback), and turn observability into signal instead of noise.


This is a bare-metal environment, think Linux, data centers, physical fleets, and real hardware constraints, not managed services. You’ll work close to the metal across Kubernetes internals (scheduling, autoscaling behavior, kubelet pressure/evictions, etcd/control plane), Linux performance (CPU/memory/IO contention), and network debugging (DNS/TCP/TLS, packet loss, congestion). On-call is part of the job, but success is measured by how much you reduce it.


Requirements


  • Senior-level experience in Site Reliability Engineering / Production Engineering running bare metal / on-prem / data center infrastructure (not public cloud only)
  • Deep hands-on expertise in Linux systems debugging and performance (CPU, memory, IO, kernel-level behaviors)
  • Strong understanding of networking (DNS/TCP/TLS, latency, packet loss, congestion, troubleshooting under load)
  • Strong Kubernetes experience beyond manifests: scheduler behavior, autoscaling edge cases, kubelet pressure/evictions, etcd/control plane
  • Experience with Terraform, Docker, Helm, and modern CI/CD practices
  • Some coding skills in Go and/or Python and/or C++


Are you looking for complexity and a new place to nerd-out on optimisation of infrastructure, please apply.


BASE SALARY: up to 225k, additional bonus and stock

FULL REMOTE IN USA

Resident permit required.


Similar Jobs

Explore other opportunities that match your interests

Amazon Connect Engineer

Devops
50m ago
Visa Sponsorship Relocation Remote
Job Type Contract
Experience Level Entry level

Oliver James

United State
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Mid-Senior level

ocho

United State

Cloud Engineer III

Devops
1h ago
Visa Sponsorship Relocation Remote
Job Type Contract
Experience Level Mid-Senior level

net2source (n2s)

United State

Subscribe our newsletter

New Things Will Always Update Regularly