We're looking for a skilled Site Reliability Engineer to improve our observability stack, create runbooks, and build infrastructure improvements. The ideal candidate is passionate about reliability, has experience with Infrastructure-as-Code tools, and is proficient in Linux systems administration and troubleshooting. This is a remote-first opportunity with a competitive salary and benefits package.
Key Highlights
Key Responsibilities
Technical Skills Required
Benefits & Perks
Nice to Have
Job Description
🔍 We’re on the lookout for a Site Reliability Engineer!
45-65K EUR | Full Remote (Latam) | Series A startup backed by top US VCs.
At Agentero we believe in simple and smart solutions for complex problems.
We are building cutting-edge technology to help insurance agents serve their customers more effectively and help them grow their businesses. We do so through a data-driven platform that provides insurance agents with market access to digital carriers.
Agentero is a remote-first Silicon Valley startup with a top-talent team spread across Spain and the US. We've raised a $13.5M Series A, bringing our total funding to over $20M with participation of top investors like Foundation Capital (Uber, Netflix) or USV (Twitter, MongoDB) and Mundi Ventures (WeFox). It's going to be huge! This is just the start…
🚀 The Opportunity
We're in search of a skilled Site Reliability Engineer based in Latin America to join our engineering team. This role works aligned with US business hours, enabling our follow-the-sun on-call model across our distributed team.
You'll work on improving our observability stack, creating runbooks that transform incident learnings into automation, and building infrastructure improvements that scale with our growth. This role is ideal for someone who is passionate about reliability, allergic to manual toil, and believes that every incident is an opportunity to make the system better.
📣 What You'll Do
- Observability & Monitoring — You will design and implement monitoring solutions that alert on symptoms rather than outages, giving us early warning before our customers are impacted.
- Runbooks & Incident Response — You will create and maintain runbooks that document every action, turning findings into repeatable processes and eventually into automation. You'll participate in blameless post-mortems to prevent incidents from ever happening again.
- Infrastructure Improvements — You will build and maintain our cloud infrastructure using Infrastructure-as-Code principles, collaborating with backend engineers to improve service reliability and reduce manual work.
- On-Call (Business Hours) — You will participate in a business-hours on-call rotation aligned with the US timezone (roughly 9am-6pm EST/PST). Our distributed team across LATAM, US, and Spain enables a follow-the-sun model with no midnight pages.
- Engineering Excellence — You will help us build a culture of reliability that the whole team is proud of, championing automation, documentation, and continuous improvement.
- Based in Latin America with availability to work aligned with US business hours (EST or PST).
- At least 4 years of relevant experience in SRE, DevOps, Platform Engineering, or Infrastructure roles.
- Proficiency with Infrastructure-as-Code tools (Terraform preferred).
- Experience with cloud platforms (AWS or GCP).
- Strong Linux systems administration and troubleshooting skills.
- Programming ability in Go, Python, or similar languages. This means you've solved problems by writing code to automate your way out of them.
- Familiarity with observability and monitoring tools (Datadog, Prometheus, Grafana, or similar).
- Ownership of your work and enjoy the autonomy of managing projects across design, implementation, and production.
- Great team player: humble, empathetic, open mindset.
- Strong verbal and written communication skills in English.
Interested in remote work opportunities in Devops? Discover Devops Remote Jobs featuring exclusive positions from top companies that offer flexible work arrangements.
- Experience with GCP platform in particular with Cloud Run, Cloud Spanner, Cloud Monitoring.
- Background in incident management and writing effective runbooks.
- Experience with CI/CD pipelines and deployment automation.
- Contributions to internal tooling or developer experience improvements.
- You believe CI servers, push-button deploys, metrics dashboards, and centralized logging are not just "nice to haves", they're critical infrastructure that rapidly pays for itself.
💰 Salary: 45-65K EUR + equity.
🏡 Remote-first: Work from anywhere in Latin America.
💻 Home office setup budget.
🤓 Training and development budget.
⏰ Business-hours on-call: We use follow-the-sun across our distributed team.
🌍 International team: Collaborate with colleagues in Spain and the US.
✈️ Team offsites in places like Mexico City, Miami, Lisbon, and more!
🗓️ Our Recruitment Process
We respect your time, so here's what to expect:
- Initial Call with the People Team to get to know you.
- Role Specific Call with our CTO to dive deeper into the nitty gritty.
- Technical Assessment to put your skills to the test.
- Technical Call with our Engineering Lead .
- Final conversation with our CEO and Founder.
Browse our curated collection of remote jobs across all categories and industries, featuring positions from top companies worldwide.
- Cloud: GCP
- Infrastructure-as-Code: Terraform
- Backend: Go
- Observability: FullStory, Looker Studio, [your monitoring tools]
- Collaboration: GitHub, Figma
- Incident Management: [PagerDuty/Opsgenie/etc.]
- Wow Stakeholders: Innovate to solve real problems simply.
- Win It: Persistence, grit, and cross-team collaboration.
- One Team: Success comes from supporting and challenging each other.
- Trust & Tell: Transparency and openness in communication.
- Own It: Accountability for results and outcomes.
Similar Jobs
Explore other opportunities that match your interests
Oddball
allen institute