Senior DevOps Engineer - Cloud Native Infrastructure Lead
Lead a global DevOps team, design and build cloud infrastructure on GCP, and drive automation, reliability, and scalability.
Key Highlights
Technical Skills Required
Benefits & Perks
Job Description
Our client is a fast-growing AI technology company redefining how large-scale dynamic pricing is handled in real-time. The company is partnering with some of the world’s leading airlines to transform how pricing is done at scale. They've developed a cutting-edge platform that enables enterprise clients to move beyond manual pricing models and embrace a fully autonomous, AI-driven system that dynamically adjusts to real-time market conditions.
In this role, you will lead a global DevOps team responsible for building and optimizing complex, cloud-native infrastructure supporting Fetcherr’s AI-powered platform. You’ll work on high-performance systems deployed on Google Cloud Platform (GCP), driving automation, reliability, and scalability across multiple environments. It’s an excellent opportunity to become a key expert within the organization — with real autonomy in decision-making, influence over technical direction, and an environment that actively encourages your ideas and innovation.
This position is hybrid in Miami, and the company welcomes candidates willing to relocate.
Responsibilities
Lead and mentor a team of DevOps engineers to deliver reliable, secure, and scalable infrastructure
Design, build, and improve cloud infrastructure on Google Cloud Platform (GCP) for high performance and resilience
Manage Kubernetes and Terraform environments in production, ensuring uptime and deployment efficiency
Automate CI/CD pipelines, release processes, and infrastructure management to speed up delivery and reduce errors
Set up and maintain monitoring, alerting, and logging systems (Prometheus, EFK, GCP Monitoring) for early issue detection and fast resolution
Work closely with development, data, and product teams to align infrastructure with business and technical goals
Define and enforce Infrastructure as Code (IaC) standards for consistency and reliability
Improve internal tools to simplify deployments and boost developer productivity
Evaluate new tools and technologies that can strengthen performance, security, and scalability
Requirements
6+ years in DevOps, Site Reliability Engineering, similar positions
2+ years leading / managing engineering teams
Strong experience with Kubernetes in production and advanced Helm chart management
Deep knowledge of Terraform and Infrastructure as Code
Solid scripting skills in Bash, Python, or another relevant language
3-4 years of hands-on experience with GCP deployments and services
Experience building and maintaining CI/CD pipelines (ArgoCD, Jenkins, GitLab CI, etc.)
Strong background in monitoring and observability (Prometheus, Grafana, GCP Monitoring)
Nice to Have
Experience building Kubernetes operators or extending ArgoCD
Familiarity with Big Data or MLOps environments
Knowledge of Airflow, Kubeflow, or MLFlow
The company is committed to creating a diverse environment and is proud to be an equal-opportunity employer. They provide a collaborative working environment along with resources, and state-of-the-art tools & equipment to promote success; and a welcoming, inclusive corporate culture where individuals are recognized for their contributions.