Senior Chaos Engineer - Relocation Required

hire tech services • United State
Relocation
Apply
AI Summary

Design and run chaos experiments to test system reliability, fault tolerance, and recovery. Collaborate with SRE, DevOps, and Development teams to improve resilience. Identify failure points in microservices, APIs, and cloud infrastructure.

Key Highlights
Design and run chaos experiments
Collaborate with SRE, DevOps, and Development teams
Identify failure points in microservices, APIs, and cloud infrastructure
Technical Skills Required
Python Bash Go Kubernetes Microservices Container orchestration Cloud (AWS/Azure/GCP) Monitoring tools (Prometheus, Grafana, Datadog, Splunk) Chaos engineering tools (Gremlin, Litmus, FIS, Chaos Mesh)
Benefits & Perks

Job Description


Location: St Louis, MO (3 days to office)


Note: Relocation Mandatory




Responsibilities:


Design and run chaos experiments to test system reliability, fault tolerance, and recovery.

Build automated chaos tests using tools like Gremlin, Litmus, Chaos Mesh, AWS Fault Injection Simulator, etc.

Identify failure points in microservices, APIs, and cloud infrastructure.

Collaborate with SRE, DevOps, and Development teams to improve resilience.

Document findings, create remediation plans, and drive resilience best practices.


Required Skills:


6+ years in SRE/DevOps/Platform Engineering with strong distributed systems knowledge.

Hands-on experience with chaos engineering tools (Gremlin, Litmus, FIS, Chaos Mesh).

Strong knowledge of Kubernetes, microservices, container orchestration, and cloud (AWS/Azure/GCP).

Experience with monitoring tools (Prometheus, Grafana, Datadog, Splunk).

Solid scripting skills: Python, Bash, or Go.


Subscribe our newsletter

New Things Will Always Update Regularly