Site Reliability Engineer

Hydrolix • United State

Remote

Apply

AI Summary

Hydrolix is seeking a Site Reliability Engineer to contribute to the reliability and scalability of our cutting-edge data platform. The ideal candidate will have deep expertise in system reliability and automation. This is a highly technical, hands-on role that requires strong technical skills and excellent communication skills.

Key Highlights

Infrastructure Reliability

Service Optimization

CI/CD Management

Key Responsibilities

Deploy, maintain, and ensure a highly reliable fleet of Kubernetes clusters and Hydrolix deployments across multiple cloud platforms.

Design, implement, and maintain systems and processes to enhance the reliability, availability, and performance of our services.

Build and optimize CI/CD tools and processes to ensure efficient and reliable deployments.

Technical Skills Required

Kubernetes Cloud Platforms (AWS, GCP, Azure, or Linode) Prometheus Vector Grafana Superset Kibana SQL databases (PostgreSQL) Python Go Rust Linux

Benefits & Perks

Remote work

On-call support

Job Description

Site Reliability Engineer

LOCATION | Hydrolix | USCAN, Remote

Job Description

At Hydrolix, we are revolutionizing the world of data management and analytics with our innovative cloud data platform, purpose-built for petabyte-scale datasets. Our mission is to help organizations drastically reduce data costs while increasing their data retention.

We are looking for a Site Reliability Engineer (SRE) to join our dynamic Services team. In this role, you will contribute to the reliability and scalability of our cutting-edge platform, ensuring exceptional solutions tailored to our customers’ unique needs. This is a highly technical, hands-on role that requires deep expertise in system reliability and automation.

Key Responsibilities

Infrastructure Reliability: Deploy, maintain, and ensure a highly reliable fleet of Kubernetes clusters and Hydrolix deployments across multiple cloud platforms.
Service Optimization: Design, implement, and maintain systems and processes to enhance the reliability, availability, and performance of our services.
CI/CD Management: Build and optimize CI/CD tools and processes to ensure efficient and reliable deployments.

Interested in remote work opportunities in Devops? Discover Devops Remote Jobs featuring exclusive positions from top companies that offer flexible work arrangements.

Monitoring and Incident Response: Develop and manage monitoring, alerting, and incident response strategies to minimize downtime and enable rapid recovery.
Root Cause Analysis: Conduct comprehensive root cause analyses for system failures, implementing long-term preventive measures.
Automation and Efficiency: Automate repetitive tasks and optimize system performance to improve operational efficiency.
On-Call Support: Participate in covering weekday business hours and once-monthly weekend shifts.

Collaboration and Customer Engagement

Cross-Functional Teamwork: Work closely with software engineering, infrastructure, and product teams to integrate reliability practices into every stage of the development lifecycle.
Reliability Advocacy: Champion SRE best practices and foster a culture of operational excellence across the organization.
Global Team Collaboration: Collaborate with a distributed team of engineers worldwide to provide round-the-clock support.

Browse our curated collection of remote jobs across all categories and industries, featuring positions from top companies worldwide.

Customer Support: Interface with customers to address and resolve reported incidents, ensuring a seamless user experience.

Qualifications and Skills

SRE Expertise: Proven experience as a Site Reliability Engineer or similar role, with a history of supporting complex distributed systems (minimum five years supporting complex distributed systems).
Observability Tools: Experience with monitoring and debugging tools like Prometheus, Vector, Grafana, Superset, or Kibana.
Cloud Platforms: Proficiency in at least one major cloud platform (AWS, GCP, Azure, or Linode).
Database Knowledge: Experience with SQL databases; familiarity with PostgreSQL is a plus but not required.
Programming Skills: Proficiency in programming languages such as Python, Go, or Rust.
Linux Expertise: Strong experience with Linux systems, including performance tuning and system-level troubleshooting.
Communication Skills: Excellent written and verbal communication skills, with the ability to convey technical concepts clearly to diverse audiences, including customers and cross-functional teams.

Job Overview

Posted Date Mar 12, 2026

Employment Type Full-time

Experience Level Mid-Senior level

Location United State

Category Devops

Company Hydrolix

Mentioned Skills

Similar Jobs

Explore other opportunities that match your interests

Platform Engineer, Compute Infrastructure

Devops

•

5h ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Mid-Senior level

missing-link.io

United State

Principal Infrastructure Engineer

Devops

•

13h ago

Premium Job

•••••• •••••• ••••••

Job Type ••••••

Experience Level ••••••

Voxel51

United State

Senior Cloud Security Engineer

Devops

•

23h ago

Visa Sponsorship Relocation Remote

Job Type Contract

Experience Level Mid-Senior level

IMPACT Technology Recruiting

United State

Site Reliability Engineer

Key Highlights

Key Responsibilities

Technical Skills Required

Benefits & Perks

Job Description

Job Overview

Mentioned Skills

Industries

Similar Jobs

Platform Engineer, Compute Infrastructure

missing-link.io

Principal Infrastructure Engineer

Premium Job

Voxel51

Senior Cloud Security Engineer

IMPACT Technology Recruiting

Subscribe our newsletter