Senior DevOps Engineer - Cloud, Containerization, and Automation
Design and maintain CI/CD pipelines, automate infrastructure provisioning, and manage cloud environments for scalability and cost efficiency. Implement containerization and orchestration, monitor system performance, and establish SLA/SLO/SLI metrics. Collaborate with development teams to troubleshoot issues and optimize deployments.
Key Highlights
Technical Skills Required
Benefits & Perks
Job Description
Job Title: Senior SRE / DevOps Engineer - W2 only - We can provide sponsorship as well
Duration: Long Term
Location: Merrimack, NH/Smithfield, RI - Hybrid
*REQUIRED SKILLS*
1) CI/CD & Automation: Design and maintain pipelines using tools like GitHub, Jenkins, Maven, and uDeploy; automate infrastructure with IaC tools such as Terraform or CloudFormation.
2) Cloud & Containerization: Manage AWS environments for scalability and cost efficiency; implement Docker and Kubernetes for orchestration.
3) Monitoring & Reliability: Use tools like Prometheus, Grafana, ELK, or Datadog for performance and security monitoring; establish SLA/SLO/SLI metrics and ensure high availability/disaster recovery.
4) Technical Expertise: Strong experience Linux/Windows OS, scripting (Python, Bash), and SQL
5) Collaboration & Incident Management: Partner with development teams to troubleshoot issues, participate in root cause analysis, and contribute to continuous improvement and resilience strategies.
Key Responsibilities
- Assist with root cause analysis process and provide feedback
- Design, implement, and maintain CI/CD pipelines using tools such as Github, Maven, Jenkins Core, and/or uDeploy.
- Automate infrastructure provisioning and configuration using Infrastructure as Code (IaC) tools like Terraform or CloudFormation.
- Manage and optimize cloud environments (AWS, Azure) for scalability and cost efficiency.
- Implement containerization and orchestration using Docker and Kubernetes.
- Monitor system performance, availability, and security using Prometheus, Grafana, ELK Stack, or Datadog.
- Establish and maintain SLA/SLO/SLI metrics to ensure reliability and performance standards.
- Participate in incident response, root cause analysis, and postmortem reviews to improve system resilience.
- Collaborate with development teams to troubleshoot application-level issues and optimize deployments.
- Work with messaging and streaming frameworks such as Kafka and/or MQ.
- Ensure high availability and disaster recovery strategies are implemented and tested.
- Contribute to Technology Lifecycle Management (TLM) initiatives and reporting.