Senior Site Reliability Engineer (SRE) - Cloud-Native Platforms - Fully Remote (Germany)
Lead the design, implementation, and improvement of system reliability, observability, and performance for a mission-critical, cloud-native platform. Collaborate with cross-functional teams to ensure high availability, observability, and performance. Requires strong experience as a Site Reliability Engineer, Kubernetes, and German language proficiency.
Key Highlights
Key Responsibilities
Technical Skills Required
Benefits & Perks
Job Description
Senior Site Reliability Engineer (SRE)
Fully Remote (Germany)
Permanent | Full-time
We are supporting a German technology company operating mission-critical, cloud-native platforms that serve internal engineering teams and external customers at scale.
The company treats reliability as a product feature, not an afterthought, and is now strengthening its platform team with a Senior Site Reliability Engineer who will take real ownership of stability, observability, and performance.
This is a fully remote role within Germany and requires professional German language skills.
The Role
As a Senior SRE, you will apply software engineering principles to infrastructure and operations challenges.
You will work closely with platform and development teams to ensure systems are:
- Highly available
- Observable end-to-end
- Performance under load
- Automated by default
Interested in remote work opportunities in Devops? Discover Devops Remote Jobs featuring exclusive positions from top companies that offer flexible work arrangements.
This is not a ticket-based operations role. You will influence how reliability is designed, measured, and improved across the platform.
Key Responsibilities
- Own and improve system reliability, uptime, and performance
- Design and operate observability stacks (metrics, logs, traces)
- Define and implement SLIs, SLOs, and error budgets
- Conduct load testing, performance tuning, and capacity planning
- Reduce operational toil through automation and tooling
- Lead or contribute to incident response and post-incident reviews
- Collaborate closely with engineers to embed reliability-by-design
Technical Environment
- Kubernetes / OpenShift in production
- Docker & Helm for packaging and deployment
- Python or TypeScript for automation and tooling
- Modern monitoring and observability platforms
- Cloud-native and container-first architecture
Browse our curated collection of remote jobs across all categories and industries, featuring positions from top companies worldwide.
What We’re Looking For
- Strong experience as a Site Reliability Engineer, Platform Engineer, or senior DevOps engineer
- Hands-on production experience with Kubernetes
- Solid understanding of observability and incident management
- Automation mindset and comfort writing production-quality code
- Calm, methodical approach to problem-solving in live environments
- German language proficiency (spoken and written)
Why This Role Stands Out
- Fully remote (Germany)
- High ownership and technical influence
- Clear commitment to SRE best practices
- Engineering-driven culture with minimal bureaucracy
- Real-world scale and meaningful reliability challenges
Similar Jobs
Explore other opportunities that match your interests
brink group
zdf sparks