Senior DevOps Engineer - Industrial Data Infrastructure

Savages Corp • Hungary
Remote
Apply
AI Summary

Ensure high availability of industrial data pipelines, lead incident response, and manage AWS infrastructure. 5-8+ years in DevOps/SRE role required. Remote, async-first collaboration.

Key Highlights
Own SLOs and incident response for critical industrial data pipelines
Operate and harden AWS infrastructure for OT data flows
Maintain deployment pipelines and run lakehouse core in production
Key Responsibilities
Own SLOs, SLAs, and error budgets for industrial data pipelines
Lead detection, response, and resolution of production incidents
Operate and harden AWS infrastructure for OT data flows
Maintain deployment pipelines with progressive delivery
Run the lakehouse + streaming core in production
Manage secrets, enforce least-privilege IAM, integrate security scanning
Technical Skills Required
AWS (EKS, Lambda, IoT Core, Kinesis) Microsoft Azure (AKS, Container Apps, Application Gateway + WAFv2, ADLS) Kubernetes, Terraform or Pulumi Prometheus, Grafana, Loki, OpenTelemetry Apache Kafka, Iceberg/Trino, Airflow/dbt
Benefits & Perks
Fully remote with flexible hours
Real-world stakes: affect actual factory lines
International team and async-first collaboration
Nice to Have
Background in industrial, IoT, or manufacturing environments

Job Description


About us


We enable manufacturers to thrive in the era of Industry 4.0 by combining cutting-edge technology with deep compliance expertise. Our platforms and services help manufacturing enterprises harness the power of data intelligence, automation, and connected systems to drive smarter decision-making, optimise operations, and accelerate digital transformation. We design and deliver scalable applications and advanced analytics that empower factories, supply chains, and production environments to become more intelligent, adaptive, and resilient.


At Savages Corp we accelerate the transition towards AI-supported businesses of the future. We run mission-critical industrial data infrastructure where downtime has real-world consequences on factory floors. As our DevOps Engineer you are the operational backbone - setting SLOs, owning incidents, hardening the platform, and making sure every pipeline that connects a sensor to a decision never quietly fails at 3am.


The Mindset We Are Looking For


  • You've been paged at 2am and fixed it before morning standup - ops pressure sharpens you, it doesn't rattle you.
  • You write runbooks after every incident because you build systems that don't require you to be awake to survive.
  • You close problems at the systemic level, not the symptom level — blameless post-mortems are second nature.
  • SLOs, error budgets, and chaos testing aren't optional ceremonies; they're how you sleep at night.


Key Responsibilities


  • Own SLOs, SLAs, and error budgets for industrial data pipelines - you are the primary on-call escalation point for critical infrastructure incidents.
  • Lead detection, response, and resolution of production incidents; write blameless post-mortems and drive fixes that eliminate repeat failures.
  • Own the full observability stack — metrics, logs, and tracing. Build alerting that pages on signals not noise, and dashboards engineers actually use.
  • Operate and harden AWS infrastructure (EKS, Lambda, IoT Core, Kinesis) with a focus on fault tolerance, auto-scaling, and disaster recovery for OT data flows.
  • Maintain deployment pipelines with progressive delivery, automated rollback triggers, and health checks that catch regressions before they reach production.
  • Run the lakehouse + streaming core in production: Apache Kafka, Iceberg on Polaris catalog, Trino query layer, and Airflow/dbt pipelines. Backup, recovery, capacity planning, and performance tuning under real industrial load. 
  • Manage secrets, enforce least-privilege IAM, own vulnerability patching, and integrate security scanning into delivery pipelines.
  • Run regular game days, failure injection tests, and capacity reviews to validate resilience before the factory floor does it for you.


What You Bring


  • 5–8+ years in a DevOps, SRE, or Platform Engineering role with primary accountability for production reliability - not just deployment automation.
  • Proven ownership of on-call rotations and incident command for high-throughput, event-driven systems.
  • Strong hands-on experience with Prometheus, Grafana, Loki, and OpenTelemetry.
  • Solid command of Microsoft Azure (AKS, Container Apps, Application Gateway + WAFv2, ADLS) AWS (EKS, Lambda, IoT Core, Kinesis), Kubernetes, and Terraform or Pulumi.
  • Operational experience running Apache Kafka in production; Iceberg/Trino or comparable lakehouse stacks a strong plus 
  • Python and Bash scripting for automation; Go is a bonus.
  • Background in industrial, IoT, or manufacturing environments is a strong differentiator.


What We Offer


  • Full ownership with minimal bureaucracy and maximum influence over operational standards.
  • Fully remote with flexible hours and async-first collaboration — on-call is structured, not constant.
  • Real-world stakes: your uptime decisions affect actual factory lines, not just dashboards.
  • Work across the full OT-to-cloud-to-AI stack alongside an experienced international team.


Logistics

  • Contract: Freelance / Full-time
  • Time zone: CET ±2
  • Travel: Open to occasional travel
  • Language: High proficiency in English required, German nice to have
  • Rate range: €35-€38 per hour



Similar Jobs

Explore other opportunities that match your interests

AI Field Engineer (Enterprise)

Devops
•
30m ago
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Entry level

medilinkers llc

United State

Senior DevOps Engineer - Global AI-Driven Security Company

Devops
•
31m ago

Premium Job

Sign up is free! Login or Sign up to view full details.

•••••• •••••• ••••••
Job Type ••••••
Experience Level ••••••

Teramind

Romania
Visa Sponsorship Relocation Remote
Job Type Part-time
Experience Level Entry level

soluciones de salud

Australia

Subscribe our newsletter

New Things Will Always Update Regularly