Join our Digital Ops team as a Senior Observability Engineer with expertise in Datadog. Design, operate, and improve observability capabilities for cloud-native services. Collaborate with DevOps, SRE, and development teams.
Key Highlights
Key Responsibilities
Technical Skills Required
Benefits & Perks
Nice to Have
Job Description
π Senior Observability Engineer β Datadog SME (LATAM)
We are looking for a Senior Observability Engineer with deep expertise in Datadog to join our Digital Ops team. This role is focused on owning and evolving the observability strategy for a large-scale, cloud-native environment supporting 150+ production services across multiple regions.
As a Datadog Subject Matter Expert, you will be responsible for designing, operating, and continuously improving observability capabilities, enabling engineering teams to build reliable, performant, and cost-efficient systems. You will work closely with DevOps, SRE, and development teams in an agile environment, acting as a technical reference for observability best practices.
π Start date: ASAP
π Contract type: Full-Time, Remote, Contractor
π Work hours and location: 8.00 am to 4.00 PM MST
π οΈ What Youβll Be Doing
- Own and lead the observability architecture and strategy across cloud-native services running in multiple environments and regions.
- Act as the Datadog Subject Matter Expert, owning configuration, governance, and best practices.
- Design, implement, and maintain Datadog dashboards, monitors, alerts, SLOs, and service health views.
- Operate and optimize Datadog APM, Logs, Metrics, Synthetic Monitoring, and RUM.
- Drive alert quality improvements, signal-to-noise reduction, and proactive detection of operational issues.
- Lead Datadog cost management and usage optimization initiatives in collaboration with engineering and finance stakeholders.
- Partner with development teams to embed observability into the SDLC and production readiness processes.
- Define and document runbooks, operational procedures, and observability standards.
- Eventually participate in a shared on-call rotation, triaging and resolving production incidents, acting as incident commander when needed, and leading post-incident reviews.
- Continuously identify opportunities for automation and toil reduction across observability and operational workflows.
- Set, track, and report on operational excellence metrics including reliability, performance, availability, security, and cost.
Interested in remote work opportunities in Devops? Discover Devops Remote Jobs featuring exclusive positions from top companies that offer flexible work arrangements.
Must-haves
- 3+ years of deep, hands-on experience with Datadog as an observability platform in production environments.
- 5+ years of experience in DevOps, SRE, or Cloud Engineering roles supporting customer-facing systems.
- Strong practical experience with Datadog APM, Logs, Metrics, dashboards, monitors, alerts, and SLOs.
- Hands-on experience with Azure, Kubernetes, Terraform, Docker, and GitOps-based workflows.
- Proven experience operating 24x7 production environments, including incident response, root cause analysis, and post-mortems.
- Solid understanding of cloud-native architectures, distributed systems, and modern observability principles.
- Ability to work independently in a fully remote, distributed team, with strong communication and collaboration skills.
- Experience with ArgoCD, Azure DevOps CI/CD pipelines, and infrastructure automation.
- Exposure to Databricks, SQL-based systems, or data-intensive platforms.
- Hands-on experience building or extending custom DevOps/SRE tooling to reduce operational toil.
- Relevant certifications (e.g. Datadog, Azure, Cloud Architecture, ITIL).
Browse our curated collection of remote jobs across all categories and industries, featuring positions from top companies worldwide.
Hereβs what to expect from our candidate-friendly interview process:
- Initial Interview β 60 minutes with our Talent Acquisition Specialist
- Culture Fit β 30 minutes with our Team Engagement Manager
- Technical Assessment β Online Challenge/Multiple Choice Questionnaire
- Final Stage β 60 minutes with the Hiring Manager
We believe that great work starts with great people. At Launchpad, we offer:
- People first culture
- Excellent compensation
- Hardware setup for working from home
- Agile methodologies
- Diverse and multicultural work environment
- Training allowances β¦and more!
Similar Jobs
Explore other opportunities that match your interests
Junior Technical Support Analyst
rocket.chat
entrupy
Cloud Security Engineer