We are seeking a highly technical Site Reliability Engineer with strong Linux, cloud, networking, and automation expertise to operate and improve production systems across hybrid and multi-cloud environments.
Key Highlights
Key Responsibilities
Technical Skills Required
Benefits & Perks
Nice to Have
Job Description
Site Reliability Engineer (Cloud & Infrastructure)
About Stateside
Stateside is a U.S.-based digital services company founded in Culver City, California. For over 12 years, we have partnered with fast-growing startups and enterprise clients to provide high-performing, scalable nearshore talent across Engineering, Product, Design, Operations, and AI.
We specialize in building long-term partnerships by recruiting, hiring, and managing top-tier LATAM talent to support U.S. companies. Our focus is on quality, cultural alignment, and sustainable growth.
About the Role
We are seeking a highly technical and hands-on Site Reliability Engineer (SRE) with strong Linux, cloud, networking, and automation expertise.
This role sits at the intersection of Infrastructure Engineering, DevOps, and Reliability. You will be responsible for operating and improving production systems across hybrid and multi-cloud environments (AWS, GCP, Azure), ensuring high availability, performance, and security.
This is a mid-to-senior level position ideal for someone with strong systems administration foundations who has evolved into cloud infrastructure and SRE practices.
Key Responsibilities
Systems Administration
- Manage and maintain Linux (RHEL, Ubuntu, CentOS) and Windows Server environments across on-prem and cloud infrastructure
- Administer Active Directory, LDAP, and cloud IAM access controls
- Perform OS patching, configuration management, and automation using Ansible or similar tools
- Manage virtualization platforms (VMware, Hyper-V, KVM)
- Maintain backup and disaster recovery procedures; validate RTO/RPO targets
Networking
- Design and troubleshoot LAN/WAN, VLANs, VPNs (IPSec, WireGuard, OpenVPN), and firewall configurations
- Manage DNS, DHCP, BGP, OSPF, and TCP/IP routing
- Monitor network performance and security using SNMP tools, Wireshark, Netflow
- Implement network segmentation and zero-trust access controls
- Administer load balancers and CDN configurations
Cloud Operations
- Operate infrastructure across AWS, GCP, and/or Azure
- Provision cloud infrastructure using Terraform or CloudFormation
- Manage VPCs, subnets, security groups, NAT gateways, Transit Gateways
- Administer Kubernetes clusters (EKS, GKE, AKS)
- Implement IAM policies, RBAC, and least-privilege access models
Interested in remote work opportunities in Devops? Discover Devops Remote Jobs featuring exclusive positions from top companies that offer flexible work arrangements.
Site Reliability Engineering
- Define and maintain SLIs, SLOs, and error budgets
- Build and maintain observability stacks (Prometheus, Grafana, ELK, Datadog, Zabbix)
- Participate in on-call rotations and incident management
- Conduct blameless post-mortems and root cause analysis
- Reduce operational toil through automation and self-healing mechanisms
- Perform capacity planning and performance tuning
Automation & Tooling
- Develop automation scripts in Python and Bash
- Build and maintain CI/CD pipelines (GitHub Actions, GitLab CI, Jenkins)
- Develop runbooks and operational documentation
- Automate provisioning and configuration management workflows
Required Skills & Experience
- 3–7 years of experience in Systems Administration, Infrastructure Engineering, DevOps, or SRE roles
- Strong Linux administration (RHEL/CentOS/Ubuntu)
- Solid networking fundamentals (TCP/IP, DNS, DHCP, VLANs, VPN, BGP/OSPF, firewalls)
- Hands-on experience with at least one major cloud provider (AWS, GCP, or Azure)
- Experience using Terraform or other Infrastructure-as-Code tools
- Proficiency in Python and/or Bash scripting
- Experience operating Kubernetes clusters (kubectl, Helm, troubleshooting)
- Experience with monitoring/observability tools (Prometheus, Grafana, ELK, Zabbix, Nagios)
- Understanding of SRE principles (SLOs, SLIs, error budgets, incident response)
- Experience working with Git and CI/CD pipelines
Browse our curated collection of remote jobs across all categories and industries, featuring positions from top companies worldwide.
Nice to Have
- Experience with Ansible for configuration management
- Knowledge of service mesh technologies (Istio, Linkerd)
- Familiarity with CIS Benchmarks, SOC 2, GDPR compliance
- Basic Go or PowerShell scripting
- Experience with Datadog, New Relic, Splunk
- Docker and container security best practices
- Experience with ITIL processes or PagerDuty
What We Offer (Stateside Benefits)
- 💵 Competitive USD salary
- 🌎 100% remote work
- 🏖 12 paid vacation days + 12 U.S. holidays
- 🎂 Birthday day off
- 🏥 International health insurance (SafetyWing)
- 💻 Equipment provided
- 📈 Long-term, stable engagement with U.S.-based clients
- 🤝 Professional growth opportunities
- 🧠 Learning & development support
- 🌴 Work-life balance-focused culture
Stateside is an equal opportunity employer dedicated to a policy of non-discrimination in employment on any basis, including age, sex, color, race, creed, national origin, religion, marital status, sexual orientation, political belief, or disability.
By submitting/sharing your application and any personal information via email, you acknowledge and agree that Stateside may store and process your details, including your CV, in its Applicant Tracking System (ATS) for recruitment purposes. If you wish to withdraw your consent and request data removal, please contact recruitment@stateside.agency.
Similar Jobs
Explore other opportunities that match your interests
DevOps Engineer
DualEntry
golabs tech