Join Launchpad as a Senior Observability Engineer with expertise in Datadog to lead the observability strategy for a large-scale cloud-native environment. Design, operate, and improve observability capabilities to enable engineering teams to build reliable systems. Collaborate with DevOps, SRE, and development teams in an agile environment.
Key Highlights
Key Responsibilities
Technical Skills Required
Benefits & Perks
Nice to Have
Job Description
Who We Are
Launchpad is a global technology partner connecting top talent with high-impact projects across North America and beyond. We specialize in Staff Augmentation and product development, helping companies scale with agility while empowering professionals to grow in meaningful, remote-first environments.
๐ Senior Observability Engineer โ Datadog SME (LATAM)
We are looking for a Senior Observability Engineer with deep expertise in Datadog to join our Digital Ops team. This role is focused on owning and evolving the observability strategy for a large-scale, cloud-native environment supporting 150+ production services across multiple regions.
As a Datadog Subject Matter Expert, you will be responsible for designing, operating, and continuously improving observability capabilities, enabling engineering teams to build reliable, performant, and cost-efficient systems. You will work closely with DevOps, SRE, and development teams in an agile environment, acting as a technical reference for observability best practices.
๐ Start date: ASAP
๐ Contract type: Full-Time, Remote, Contractor
๐ Work hours and location: 8.00 am to 4.00 PM MST
๐ ๏ธ What You'll Be Doing
- Own and lead the observability architecture and strategy across cloud-native services running in multiple environments and regions.
- Act as the Datadog Subject Matter Expert, owning configuration, governance, and best practices.
- Design, implement, and maintain Datadog dashboards, monitors, alerts, SLOs, and service health views.
- Operate and optimize Datadog APM, Logs, Metrics, Synthetic Monitoring, and RUM.
- Drive alert quality improvements, signal-to-noise reduction, and proactive detection of operational issues.
- Lead Datadog cost management and usage optimization initiatives in collaboration with engineering and finance stakeholders.
- Partner with development teams to embed observability into the SDLC and production readiness processes.
- Define and document runbooks, operational procedures, and observability standards.
- Eventually participate in a shared on-call rotation, triaging and resolving production incidents, acting as incident commander when needed, and leading post-incident reviews.
- Continuously identify opportunities for automation and toil reduction across observability and operational workflows.
- Set, track, and report on operational excellence metrics including reliability, performance, availability, security, and cost.
Interested in remote work opportunities in Devops? Discover Devops Remote Jobs featuring exclusive positions from top companies that offer flexible work arrangements.
Must-haves
- 3+ years of deep, hands-on experience with Datadog as an observability platform in production environments.
- 5+ years of experience in DevOps, SRE, or Cloud Engineering roles supporting customer-facing systems.
- Strong practical experience with Datadog APM, Logs, Metrics, dashboards, monitors, alerts, and SLOs.
- Hands-on experience with Azure, Kubernetes, Terraform, Docker, and GitOps-based workflows.
- Proven experience operating 24x7 production environments, including incident response, root cause analysis, and post-mortems.
- Solid understanding of cloud-native architectures, distributed systems, and modern observability principles.
- Ability to work independently in a fully remote, distributed team, with strong communication and collaboration skills.
- Experience with ArgoCD, Azure DevOps CI/CD pipelines, and infrastructure automation.
- Exposure to Databricks, SQL-based systems, or data-intensive platforms.
- Hands-on experience building or extending custom DevOps/SRE tooling to reduce operational toil.
- Relevant certifications (e.g. Datadog, Azure, Cloud Architecture, ITIL).
Browse our curated collection of remote jobs across all categories and industries, featuring positions from top companies worldwide.
- Initial Interview โ 60 minutes with our Talent Acquisition Specialist
- Culture Fit โ 30 minutes with our Team Engagement Manager
- Technical Assessment โ Online Challenge/Multiple Choice Questionnaire
- Final Stage โ 60 minutes with the Hiring Manager
We believe that great work starts with great people. At Launchpad, we offer:
- People first culture
- Excellent compensation
- Hardware setup for working from home
- Agile methodologies
- Diverse and multicultural work environment
- Training allowances โฆand more!
Compliance & Equal Opportunity
Launchpad is an equal opportunity employer committed to creating an inclusive environment for all applicants. We do not discriminate on the basis of race, color, religion, gender identity, sexual orientation, age, disability, or any other protected status under applicable laws in Canada and British Columbia.
All candidate information will be handled confidentially and used solely for recruitment purposes in accordance with applicable privacy regulations.
Similar Jobs
Explore other opportunities that match your interests
Sky Systems, Inc. (SkySys)
Lumenalta
Senior Systems Engineer, Production