Design, build, and operate enterprise-grade observability platforms. Collaborate with SREs and product teams to define SLOs and transform raw telemetry into actionable insights. Improve incident response efficiency in a fast-paced, engineering-driven environment.
Key Highlights
Technical Skills Required
Benefits & Perks
Job Description
This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Observability Engineer based in the United States.
This role is focused on building and scaling the observability backbone that enables engineering teams to operate complex distributed systems with confidence. You will design and run end-to-end telemetry platforms covering metrics, logs, traces, and events, ensuring high signal quality and operational reliability. The position spans both infrastructure and software engineering, combining platform architecture with hands-on implementation of monitoring, alerting, and tracing systems. You will work closely with SREs, platform engineers, and product teams to define meaningful SLOs and transform raw telemetry into actionable insights. The environment is fast-paced and engineering-driven, with a strong emphasis on automation, scalability, and developer experience. This is a high-impact role where your work directly influences system reliability, incident response efficiency, and production visibility across the organization.
Accountabilities
- Design, build, and operate enterprise-grade observability platforms across metrics, logs, traces, and events.
- Architect and maintain scalable monitoring stacks using Prometheus, Grafana, Loki, Tempo, OpenTelemetry, and Datadog.
- Define and implement SLOs, SLIs, error budgets, and alerting strategies aligned with system reliability goals.
- Develop high-quality dashboards, alerts, and observability standards to reduce noise and improve signal accuracy.
- Manage distributed tracing pipelines and enable teams to diagnose latency and performance issues effectively.
- Operate large-scale time-series and log systems, optimizing for performance, retention, and cost efficiency.
- Build self-service observability tooling, templates, and libraries to improve adoption across engineering teams.
- Integrate observability practices into CI/CD pipelines, incident response workflows, and progressive delivery systems.
- Improve incident response readiness through better alerting hygiene, dashboards, and postmortem tooling.
- Maintain clear documentation, onboarding guides, and runbooks for observability systems and standards.
- Mentor engineers on observability best practices, debugging techniques, and SRE principles.
Interested in remote work opportunities in Development & Programming? Discover Development & Programming Remote Jobs featuring exclusive positions from top companies that offer flexible work arrangements.
- Bachelor’s degree in Computer Science or a related technical field.
- 5+ years of experience in SRE, platform engineering, or observability-focused roles.
- Strong hands-on experience with Prometheus, Grafana, and at least one commercial observability tool (Datadog, New Relic, or Splunk).
- Deep understanding of OpenTelemetry, distributed tracing, and structured logging practices.
- Proficiency in at least one programming language (Go, Python, or Java).
- Experience operating high-scale metrics and logging pipelines with attention to performance and cost.
- Strong knowledge of SLOs, error budgets, and reliability engineering principles.
- Experience integrating observability into CI/CD pipelines and incident management tools.
- Solid understanding of Linux systems, networking fundamentals, and containerized environments.
- Strong communication skills and ability to collaborate across engineering and operations teams.
- Exposure to tools such as Thanos, Mimir, Cortex, Loki, or Tempo is a plus.
- Experience with observability cost optimization or eBPF-based tooling is a strong advantage.
Browse our curated collection of remote jobs across all categories and industries, featuring positions from top companies worldwide.
- Competitive annual salary ranging from $100,000 to $150,000 based on experience.
- 100% remote role within the continental United States.
- Full-time W2 employment with long-term, multi-year engagement stability.
- Comprehensive benefits package including healthcare and standard employee benefits.
- Opportunity to work on large-scale distributed systems and modern observability stacks.
- Exposure to industry-leading tools and cloud-native observability technologies.
- Strong engineering culture focused on reliability, automation, and continuous improvement.
- Career growth opportunities in SRE, platform engineering, and cloud observability domains.
We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.
We appreciate your interest and wish you the best!
Why Apply Through Jobgether?
Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.
We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses and identifying potential inconsistencies or verification signals in application materials based on available information. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.
Similar Jobs
Explore other opportunities that match your interests
Senior FPGA Firmware Engineer II - Remote (Space Domain Awareness)
Actalent
Junior Software Engineer - Core API Operations
next match ai