Site Reliability Engineer – Incident Response & Data-Driven Reliability Strategy

BayOne Solutions United State
Remote
Apply
AI Summary

We are seeking a Site Reliability Engineer with 4+ years of experience in SRE, DevOps, or Systems Engineering to help clients improve stability and reliability. The role focuses on transforming raw incident logs into actionable reliability strategies through systems engineering and data science. Key responsibilities include managing production environments at scale, building observability tooling, and designing container orchestration and cloud infrastructure solutions.

Key Highlights
4+ years in SRE, DevOps, or Systems Engineering managing production environments at scale
Strong experience with SQL and data analysis
Expertise in Golang, Java, Python, or C++
Deep understanding of alerting systems, distributed tracing, structured logging, and metrics collection
Experience with Kubernetes and GCP cloud infrastructure
Key Responsibilities
Help clients understand where to improve stability and reliability
Build tooling and culture to transform raw incident logs into actionable reliability strategies
Manage production environments at scale
Design and implement container orchestration and cloud infrastructure solutions
Technical Skills Required
SRE DevOps Systems Engineering SQL data analysis Golang Java Python C++ alerting systems distributed tracing structured logging metrics collection Kubernetes GCP
Benefits & Perks
100% Remote – US Local Only

Job Description


Role: Site Reliability Engineer

Location: 100% Remote – US Local Only


Project Outline:

  • We are looking for a Site Reliability Engineer with experience in incident response. In this role, you will help client understand where we can improve stability and reliability.
  • There will be a focus on the intersection of systems engineering and data science, building the tooling and culture necessary to transform raw incident logs into actionable reliability strategies.


Skill Requirements:

  • Engineering Background: 4+ years in SRE, DevOps, or Systems Engineering roles managing production environments at scale.
  • Data Proficiency: Strong experience with SQL and data analysis
  • Coding Skills: Expertise in one or more programming languages such as Golang, Java, Python, or C++.
  • Observability Expertise: Deep understanding of alerting systems, distributed tracing, structured logging, and metrics collection.
  • Systems Design: Experience with container orchestration (Kubernetes) and cloud infrastructure (GCP).

Similar Jobs

Explore other opportunities that match your interests

Analytics / Reporting Engineer

Programming
38m ago
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Associate

halomd

United State

Senior Rust Backend Developer

Programming
44m ago
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Mid-Senior level

cura label technologies

United State

Java Full Stack Developer

Programming
1h ago
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Associate

sundayy

United State

Subscribe our newsletter

New Things Will Always Update Regularly