Site Reliability Engineer (Cloud & Infrastructure)

Stateside Latin America
Remote
Apply
AI Summary

We are seeking a highly technical Site Reliability Engineer with strong Linux, cloud, networking, and automation expertise to operate and improve production systems across hybrid and multi-cloud environments.

Key Highlights
Manage and maintain Linux and Windows Server environments
Administer Active Directory, LDAP, and cloud IAM access controls
Design and troubleshoot LAN/WAN, VLANs, VPNs, and firewall configurations
Key Responsibilities
Manage and maintain Linux and Windows Server environments
Administer Active Directory, LDAP, and cloud IAM access controls
Design and troubleshoot LAN/WAN, VLANs, VPNs, and firewall configurations
Operate infrastructure across AWS, GCP, and/or Azure
Provision cloud infrastructure using Terraform or CloudFormation
Manage VPCs, subnets, security groups, NAT gateways, Transit Gateways
Administer Kubernetes clusters (EKS, GKE, AKS)
Implement IAM policies, RBAC, and least-privilege access models
Technical Skills Required
Linux (RHEL, Ubuntu, CentOS) Windows Server Active Directory LDAP Cloud IAM Ansible VMware Hyper-V KVM Terraform CloudFormation Kubernetes (EKS, GKE, AKS) Python Bash Prometheus Grafana ELK Datadog Zabbix
Benefits & Perks
Competitive USD salary
100% remote work
12 paid vacation days + 12 U.S. holidays
Birthday day off
International health insurance (SafetyWing)
Equipment provided
Long-term, stable engagement with U.S.-based clients
Professional growth opportunities
Learning & development support
Work-life balance-focused culture
Nice to Have
Ansible for configuration management
Service mesh technologies (Istio, Linkerd)
CIS Benchmarks, SOC 2, GDPR compliance
Basic Go or PowerShell scripting
Datadog, New Relic, Splunk
Docker and container security best practices
ITIL processes or PagerDuty

Job Description


Site Reliability Engineer (Cloud & Infrastructure)


About Stateside

Stateside is a U.S.-based digital services company founded in Culver City, California. For over 12 years, we have partnered with fast-growing startups and enterprise clients to provide high-performing, scalable nearshore talent across Engineering, Product, Design, Operations, and AI.

We specialize in building long-term partnerships by recruiting, hiring, and managing top-tier LATAM talent to support U.S. companies. Our focus is on quality, cultural alignment, and sustainable growth.


About the Role

We are seeking a highly technical and hands-on Site Reliability Engineer (SRE) with strong Linux, cloud, networking, and automation expertise.


This role sits at the intersection of Infrastructure Engineering, DevOps, and Reliability. You will be responsible for operating and improving production systems across hybrid and multi-cloud environments (AWS, GCP, Azure), ensuring high availability, performance, and security.

This is a mid-to-senior level position ideal for someone with strong systems administration foundations who has evolved into cloud infrastructure and SRE practices.


Key Responsibilities

Systems Administration

  • Manage and maintain Linux (RHEL, Ubuntu, CentOS) and Windows Server environments across on-prem and cloud infrastructure
  • Administer Active Directory, LDAP, and cloud IAM access controls
  • Perform OS patching, configuration management, and automation using Ansible or similar tools
  • Manage virtualization platforms (VMware, Hyper-V, KVM)
  • Maintain backup and disaster recovery procedures; validate RTO/RPO targets

Networking

  • Design and troubleshoot LAN/WAN, VLANs, VPNs (IPSec, WireGuard, OpenVPN), and firewall configurations
  • Manage DNS, DHCP, BGP, OSPF, and TCP/IP routing
  • Monitor network performance and security using SNMP tools, Wireshark, Netflow
  • Implement network segmentation and zero-trust access controls
  • Administer load balancers and CDN configurations

Cloud Operations

  • Operate infrastructure across AWS, GCP, and/or Azure
  • Provision cloud infrastructure using Terraform or CloudFormation
  • Manage VPCs, subnets, security groups, NAT gateways, Transit Gateways
  • Administer Kubernetes clusters (EKS, GKE, AKS)
  • Implement IAM policies, RBAC, and least-privilege access models

Site Reliability Engineering

  • Define and maintain SLIs, SLOs, and error budgets
  • Build and maintain observability stacks (Prometheus, Grafana, ELK, Datadog, Zabbix)
  • Participate in on-call rotations and incident management
  • Conduct blameless post-mortems and root cause analysis
  • Reduce operational toil through automation and self-healing mechanisms
  • Perform capacity planning and performance tuning

Automation & Tooling

  • Develop automation scripts in Python and Bash
  • Build and maintain CI/CD pipelines (GitHub Actions, GitLab CI, Jenkins)
  • Develop runbooks and operational documentation
  • Automate provisioning and configuration management workflows


Required Skills & Experience

  • 3–7 years of experience in Systems Administration, Infrastructure Engineering, DevOps, or SRE roles
  • Strong Linux administration (RHEL/CentOS/Ubuntu)
  • Solid networking fundamentals (TCP/IP, DNS, DHCP, VLANs, VPN, BGP/OSPF, firewalls)
  • Hands-on experience with at least one major cloud provider (AWS, GCP, or Azure)
  • Experience using Terraform or other Infrastructure-as-Code tools
  • Proficiency in Python and/or Bash scripting
  • Experience operating Kubernetes clusters (kubectl, Helm, troubleshooting)
  • Experience with monitoring/observability tools (Prometheus, Grafana, ELK, Zabbix, Nagios)
  • Understanding of SRE principles (SLOs, SLIs, error budgets, incident response)
  • Experience working with Git and CI/CD pipelines


Nice to Have

  • Experience with Ansible for configuration management
  • Knowledge of service mesh technologies (Istio, Linkerd)
  • Familiarity with CIS Benchmarks, SOC 2, GDPR compliance
  • Basic Go or PowerShell scripting
  • Experience with Datadog, New Relic, Splunk
  • Docker and container security best practices
  • Experience with ITIL processes or PagerDuty


What We Offer (Stateside Benefits)

  • 💵 Competitive USD salary
  • 🌎 100% remote work
  • 🏖 12 paid vacation days + 12 U.S. holidays
  • 🎂 Birthday day off
  • 🏥 International health insurance (SafetyWing)
  • 💻 Equipment provided
  • 📈 Long-term, stable engagement with U.S.-based clients
  • 🤝 Professional growth opportunities
  • 🧠 Learning & development support
  • 🌴 Work-life balance-focused culture



Stateside is an equal opportunity employer dedicated to a policy of non-discrimination in employment on any basis, including age, sex, color, race, creed, national origin, religion, marital status, sexual orientation, political belief, or disability.


By submitting/sharing your application and any personal information via email, you acknowledge and agree that Stateside may store and process your details, including your CV, in its Applicant Tracking System (ATS) for recruitment purposes. If you wish to withdraw your consent and request data removal, please contact recruitment@stateside.agency.


Similar Jobs

Explore other opportunities that match your interests

DevOps Engineer

Devops
2d ago

Premium Job

Sign up is free! Login or Sign up to view full details.

•••••• •••••• ••••••
Job Type ••••••
Experience Level ••••••

DualEntry

Latin America
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Mid-Senior level

golabs tech

Latin America
Visa Sponsorship Relocation Remote
Job Type Contract
Experience Level Mid-Senior level

ishift

Latin America

Subscribe our newsletter

New Things Will Always Update Regularly