Site Reliability Engineer

agentero United State
Remote
Apply
AI Summary

We're looking for a skilled Site Reliability Engineer to improve our observability stack, create runbooks, and build infrastructure improvements. The ideal candidate is passionate about reliability, has experience with Infrastructure-as-Code tools, and is proficient in Linux systems administration and troubleshooting. This is a remote-first opportunity with a competitive salary and benefits package.

Key Highlights
Improve observability stack
Create runbooks
Build infrastructure improvements
Key Responsibilities
Design and implement monitoring solutions
Create and maintain runbooks
Build and maintain cloud infrastructure
Participate in blameless post-mortems
Help build a culture of reliability
Technical Skills Required
Infrastructure-as-Code Linux systems administration Troubleshooting Go Python Terraform AWS GCP Datadog Prometheus Grafana
Benefits & Perks
Salary: 45-65K EUR
Remote work
Home office setup budget
Training and development budget
Business-hours on-call
International team
Team offsites
Nice to Have
Experience with GCP platform
Background in incident management
Experience with CI/CD pipelines

Job Description


🔍 We’re on the lookout for a Site Reliability Engineer!

45-65K EUR | Full Remote (Latam) | Series A startup backed by top US VCs.

At Agentero we believe in simple and smart solutions for complex problems.

We are building cutting-edge technology to help insurance agents serve their customers more effectively and help them grow their businesses. We do so through a data-driven platform that provides insurance agents with market access to digital carriers.

Agentero is a remote-first Silicon Valley startup with a top-talent team spread across Spain and the US. We've raised a $13.5M Series A, bringing our total funding to over $20M with participation of top investors like Foundation Capital (Uber, Netflix) or USV (Twitter, MongoDB) and Mundi Ventures (WeFox). It's going to be huge! This is just the start…

🚀 The Opportunity

We're in search of a skilled Site Reliability Engineer based in Latin America to join our engineering team. This role works aligned with US business hours, enabling our follow-the-sun on-call model across our distributed team.

You'll work on improving our observability stack, creating runbooks that transform incident learnings into automation, and building infrastructure improvements that scale with our growth. This role is ideal for someone who is passionate about reliability, allergic to manual toil, and believes that every incident is an opportunity to make the system better.

📣 What You'll Do

  • Observability & Monitoring — You will design and implement monitoring solutions that alert on symptoms rather than outages, giving us early warning before our customers are impacted.
  • Runbooks & Incident Response — You will create and maintain runbooks that document every action, turning findings into repeatable processes and eventually into automation. You'll participate in blameless post-mortems to prevent incidents from ever happening again.
  • Infrastructure Improvements — You will build and maintain our cloud infrastructure using Infrastructure-as-Code principles, collaborating with backend engineers to improve service reliability and reduce manual work.
  • On-Call (Business Hours) — You will participate in a business-hours on-call rotation aligned with the US timezone (roughly 9am-6pm EST/PST). Our distributed team across LATAM, US, and Spain enables a follow-the-sun model with no midnight pages.
  • Engineering Excellence — You will help us build a culture of reliability that the whole team is proud of, championing automation, documentation, and continuous improvement.


👤 What We're Looking For (Must-haves)

  • Based in Latin America with availability to work aligned with US business hours (EST or PST).
  • At least 4 years of relevant experience in SRE, DevOps, Platform Engineering, or Infrastructure roles.
  • Proficiency with Infrastructure-as-Code tools (Terraform preferred).
  • Experience with cloud platforms (AWS or GCP).
  • Strong Linux systems administration and troubleshooting skills.
  • Programming ability in Go, Python, or similar languages. This means you've solved problems by writing code to automate your way out of them.
  • Familiarity with observability and monitoring tools (Datadog, Prometheus, Grafana, or similar).
  • Ownership of your work and enjoy the autonomy of managing projects across design, implementation, and production.
  • Great team player: humble, empathetic, open mindset.
  • Strong verbal and written communication skills in English.


🌟 Nice-to-haves

  • Experience with GCP platform in particular with Cloud Run, Cloud Spanner, Cloud Monitoring.
  • Background in incident management and writing effective runbooks.
  • Experience with CI/CD pipelines and deployment automation.
  • Contributions to internal tooling or developer experience improvements.
  • You believe CI servers, push-button deploys, metrics dashboards, and centralized logging are not just "nice to haves", they're critical infrastructure that rapidly pays for itself.


💎 Why You Should Join Agentero

💰 Salary: 45-65K EUR + equity.

🏡 Remote-first: Work from anywhere in Latin America.

💻 Home office setup budget.

🤓 Training and development budget.

Business-hours on-call: We use follow-the-sun across our distributed team.

🌍 International team: Collaborate with colleagues in Spain and the US.

✈️ Team offsites in places like Mexico City, Miami, Lisbon, and more!

🗓️ Our Recruitment Process

We respect your time, so here's what to expect:

  • Initial Call with the People Team to get to know you.
  • Role Specific Call with our CTO to dive deeper into the nitty gritty.
  • Technical Assessment to put your skills to the test.
  • Technical Call with our Engineering Lead .
  • Final conversation with our CEO and Founder.


🔧 Our Tech Stack & Tools

  • Cloud: GCP
  • Infrastructure-as-Code: Terraform
  • Backend: Go
  • Observability: FullStory, Looker Studio, [your monitoring tools]
  • Collaboration: GitHub, Figma
  • Incident Management: [PagerDuty/Opsgenie/etc.]


❤️ Our Values

  • Wow Stakeholders: Innovate to solve real problems simply.
  • Win It: Persistence, grit, and cross-team collaboration.
  • One Team: Success comes from supporting and challenging each other.
  • Trust & Tell: Transparency and openness in communication.
  • Own It: Accountability for results and outcomes.


If you're ready to make a meaningful impact and help us build a reliable platform that insurance agents depend on, we invite you to apply today. Let's shape the future of insurance together! 🚀

Similar Jobs

Explore other opportunities that match your interests

DevSecOps Engineer

Devops
2h ago
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Mid-Senior level

Oddball

United State
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Mid-Senior level

allen institute

United State
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Mid-Senior level

ocho

United State

Subscribe our newsletter

New Things Will Always Update Regularly