Site Reliability Engineer (Contract) - AWS, Kubernetes, Observability

Digital Gurus • United Kingdom

Remote

Apply

AI Summary

Market-leading consultancy seeks a Contract Site Reliability Engineer to support an AWS-hosted data platform. Key responsibilities include enhancing reliability, observability, and automation for critical data services. Requires 3-5 years of SRE/DevOps experience with strong AWS, Kubernetes/EKS, and monitoring tools.

Key Highlights

Support an AWS-hosted data platform focusing on reliability, observability, automation, and operational excellence.

Define and operationalize SLIs, SLOs, and error budgets for critical data services.

Build and maintain observability dashboards and monitoring frameworks using tools like Dynatrace and Prometheus.

Key Responsibilities

Define and operationalise SLIs, SLOs and error budgets for critical data services and platform components.

Build and maintain observability dashboards and monitoring frameworks using tools such as Dynatrace, Prometheus and associated monitoring/logging/tracing platforms.

Implement end-to-end monitoring across metrics, logs and traces, helping the team detect issues proactively before they impact users.

Work across the AWS ecosystem, supporting workloads running on EKS / Kubernetes.

Collaborate closely with developers, architects and platform teams to improve reliability, scalability, performance and operational resilience.

Support incident response, root cause analysis and blameless post-mortems, helping drive long-term improvements rather than short-term fixes.

Automate repetitive operational tasks to reduce toil and improve platform efficiency.

Help establish and track the key golden signals: latency, traffic, errors and saturation.

Contribute to reliability and resilience backlogs, helping identify improvements across monitoring, alerting, automation and platform stability.

Technical Skills Required

AWS Kubernetes EKS Dynatrace Prometheus Grafana CloudWatch OpenTelemetry ELK Terraform Python Bash Ansible

Benefits & Perks

£300-400 per day rate

Fully Remote

Nice to Have

Experience working with data platforms, data pipelines or data-heavy environments.

Exposure to batch or streaming data workloads, such as Kafka, Spark, Airflow, Glue, EMR, Databricks or similar.

Data observability experience.

Previous public sector experience across multiple engagements.

Active or lapsed SC clearance.

Consultancy experience would be beneficial, especially within environments such as Capgemini, Accenture, BJSS, Kainos, Sopra Steria, CGI, PA Consulting, Leidos, BAE Digital Intelligence, Deloitte, Cognizant or similar.

Job Description

Site Reliability Engineer (Contract)

Type: Contract

WFH: Fully Remote

Rate: £300-400 per day (Inside)

Location: Remote

Experience level: Around 3-5 years, though candidates with slightly more experience may be relevant if rate-aligned

A market-leading consultancy is looking for a Site Reliability Engineer to support an AWS-hosted data platform, working across reliability, observability, automation and operational excellence.

This role would suit an SRE, DevOps Engineer or Platform Engineer with strong AWS experience, hands-on Kubernetes/EKS exposure, and a good understanding of observability, monitoring and incident management. The successful contractor will help define and operationalise SLIs, SLOs and error budgets across critical data services, ensuring the platform is reliable, scalable and well-monitored. Public sector experience is highly desirable, and SC clearance is preferred but not required.

Key Responsibilities

You will define and operationalise SLIs, SLOs and error budgets for critical data services and platform components.

Interested in remote work opportunities in Devops? Discover Devops Remote Jobs featuring exclusive positions from top companies that offer flexible work arrangements.

You will build and maintain observability dashboards and monitoring frameworks using tools such as Dynatrace, Prometheus and associated monitoring/logging/tracing platforms.
You will implement end-to-end monitoring across metrics, logs and traces, helping the team detect issues proactively before they impact users.
You will work across the AWS ecosystem, supporting workloads running on EKS / Kubernetes.
You will collaborate closely with developers, architects and platform teams to improve reliability, scalability, performance and operational resilience.
You will support incident response, root cause analysis and blameless post-mortems, helping drive long-term improvements rather than short-term fixes.
You will automate repetitive operational tasks to reduce toil and improve platform efficiency.
You will help establish and track the key golden signals: latency, traffic, errors and saturation.
You will contribute to reliability and resilience backlogs, helping identify improvements across monitoring, alerting, automation and platform stability.

Essential Skills

Strong commercial experience as an SRE, DevOps Engineer, Platform Engineer or Cloud Engineer.
Strong AWS experience, ideally within production-scale environments.

Browse our curated collection of remote jobs across all categories and industries, featuring positions from top companies worldwide.

Hands-on experience with Kubernetes, ideally Amazon EKS.
Experience with observability and monitoring tools such as Dynatrace, Prometheus, Grafana, CloudWatch, OpenTelemetry, ELK or similar.
Understanding of SLIs, SLOs, error budgets and golden signals.
Experience supporting incident management, root cause analysis and post-incident improvement work.
Automation experience using scripting or IaC tooling such as Terraform, Python, Bash, Ansible or similar.
Good understanding of platform reliability, scalability, resilience and performance.
Desirable Skills
Experience working with data platforms, data pipelines or data-heavy environments.
Exposure to batch or streaming data workloads, such as Kafka, Spark, Airflow, Glue, EMR, Databricks or similar.
Data observability experience.
Previous public sector experience across multiple engagements.
Active or lapsed SC clearance.
Consultancy experience would be beneficial, especially within environments such as Capgemini, Accenture, BJSS, Kainos, Sopra Steria, CGI, PA Consulting, Leidos, BAE Digital Intelligence, Deloitte, Cognizant or similar.

Job Overview

Posted Date Jun 20, 2026

Employment Type Full-time

Experience Level Mid-Senior level

Location United Kingdom

Category Devops

Company Digital Gurus

Mentioned Skills

Similar Jobs

Explore other opportunities that match your interests

Lead DevOps Engineer (Azure, SQL Server, Windows)

Devops

•

7h ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Mid-Senior level

Prism Digital

United Kingdom

Cloud Services Engineer (Windows/Linux) - Production Services

Devops

•

1d ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Mid-Senior level

magellan

United Kingdom

Senior Site Reliability Engineer / DevOps Engineer - Remote (Financial Services)

Devops

•

1d ago

Premium Job

•••••• •••••• ••••••

Job Type ••••••

Experience Level ••••••

Korn Ferry

United Kingdom

Site Reliability Engineer (Contract) - AWS, Kubernetes, Observability

Key Highlights

Key Responsibilities

Technical Skills Required

Benefits & Perks

Nice to Have

Job Description

Job Overview

Mentioned Skills

Industries

Similar Jobs

Lead DevOps Engineer (Azure, SQL Server, Windows)

Prism Digital

Cloud Services Engineer (Windows/Linux) - Production Services

magellan

Senior Site Reliability Engineer / DevOps Engineer - Remote (Financial Services)

Premium Job

Korn Ferry

Subscribe our newsletter