Senior Observability Engineer - Build & Scale Distributed Systems

Jobgether • United State

Remote

Apply

AI Summary

Design, build, and operate enterprise-grade observability platforms. Collaborate with SREs and product teams to define SLOs and transform raw telemetry into actionable insights. Improve incident response efficiency in a fast-paced, engineering-driven environment.

Key Highlights

Build and scale observability platforms across metrics, logs, traces, and events

Define and implement SLOs, SLIs, and alerting strategies

Develop high-quality dashboards and observability standards

Manage distributed tracing pipelines and optimize large-scale time-series systems

Technical Skills Required

Prometheus Grafana Loki Tempo OpenTelemetry Datadog Go Python Java

Benefits & Perks

Competitive annual salary ranging from $100,000 to $150,000

100% remote role within the continental United States

Comprehensive benefits package

Job Description

This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Observability Engineer based in the United States.

This role is focused on building and scaling the observability backbone that enables engineering teams to operate complex distributed systems with confidence. You will design and run end-to-end telemetry platforms covering metrics, logs, traces, and events, ensuring high signal quality and operational reliability. The position spans both infrastructure and software engineering, combining platform architecture with hands-on implementation of monitoring, alerting, and tracing systems. You will work closely with SREs, platform engineers, and product teams to define meaningful SLOs and transform raw telemetry into actionable insights. The environment is fast-paced and engineering-driven, with a strong emphasis on automation, scalability, and developer experience. This is a high-impact role where your work directly influences system reliability, incident response efficiency, and production visibility across the organization.

Accountabilities

Design, build, and operate enterprise-grade observability platforms across metrics, logs, traces, and events.
Architect and maintain scalable monitoring stacks using Prometheus, Grafana, Loki, Tempo, OpenTelemetry, and Datadog.
Define and implement SLOs, SLIs, error budgets, and alerting strategies aligned with system reliability goals.
Develop high-quality dashboards, alerts, and observability standards to reduce noise and improve signal accuracy.
Manage distributed tracing pipelines and enable teams to diagnose latency and performance issues effectively.
Operate large-scale time-series and log systems, optimizing for performance, retention, and cost efficiency.
Build self-service observability tooling, templates, and libraries to improve adoption across engineering teams.
Integrate observability practices into CI/CD pipelines, incident response workflows, and progressive delivery systems.
Improve incident response readiness through better alerting hygiene, dashboards, and postmortem tooling.
Maintain clear documentation, onboarding guides, and runbooks for observability systems and standards.
Mentor engineers on observability best practices, debugging techniques, and SRE principles.

Interested in remote work opportunities in Development & Programming? Discover Development & Programming Remote Jobs featuring exclusive positions from top companies that offer flexible work arrangements.

Requirements

Bachelor’s degree in Computer Science or a related technical field.
5+ years of experience in SRE, platform engineering, or observability-focused roles.
Strong hands-on experience with Prometheus, Grafana, and at least one commercial observability tool (Datadog, New Relic, or Splunk).
Deep understanding of OpenTelemetry, distributed tracing, and structured logging practices.
Proficiency in at least one programming language (Go, Python, or Java).
Experience operating high-scale metrics and logging pipelines with attention to performance and cost.
Strong knowledge of SLOs, error budgets, and reliability engineering principles.
Experience integrating observability into CI/CD pipelines and incident management tools.
Solid understanding of Linux systems, networking fundamentals, and containerized environments.
Strong communication skills and ability to collaborate across engineering and operations teams.
Exposure to tools such as Thanos, Mimir, Cortex, Loki, or Tempo is a plus.
Experience with observability cost optimization or eBPF-based tooling is a strong advantage.

Benefits

Browse our curated collection of remote jobs across all categories and industries, featuring positions from top companies worldwide.

Competitive annual salary ranging from $100,000 to $150,000 based on experience.
100% remote role within the continental United States.
Full-time W2 employment with long-term, multi-year engagement stability.
Comprehensive benefits package including healthcare and standard employee benefits.
Opportunity to work on large-scale distributed systems and modern observability stacks.
Exposure to industry-leading tools and cloud-native observability technologies.
Strong engineering culture focused on reliability, automation, and continuous improvement.
Career growth opportunities in SRE, platform engineering, and cloud observability domains.

How Jobgether Works

We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.

We appreciate your interest and wish you the best!

Why Apply Through Jobgether?

Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.

We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses and identifying potential inconsistencies or verification signals in application materials based on available information. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.

Job Overview

Posted Date Jun 20, 2026

Employment Type Full-time

Experience Level Not Applicable

Location United State

Annual Salary 100,000 - 150,000 USD

Category Programming

Company Jobgether

Mentioned Skills

Similar Jobs

Explore other opportunities that match your interests

Senior FPGA Firmware Engineer II - Remote (Space Domain Awareness)

Programming

•

2h ago

Premium Job

•••••• •••••• ••••••

Job Type ••••••

Experience Level ••••••

Actalent

United State

Junior Software Engineer - Core API Operations

Programming

•

2h ago

Premium Job

•••••• •••••• ••••••

Job Type ••••••

Experience Level ••••••

next match ai

United State

Training Manager

Programming

•

3h ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Not Applicable

joblet-ai

United State

Senior Observability Engineer - Build & Scale Distributed Systems

Key Highlights

Technical Skills Required

Benefits & Perks

Job Description

Job Overview

Mentioned Skills

Industries

Similar Jobs

Senior FPGA Firmware Engineer II - Remote (Space Domain Awareness)

Premium Job

Actalent

Junior Software Engineer - Core API Operations

Premium Job

next match ai

Training Manager

joblet-ai

Subscribe our newsletter