Senior DevOps Engineer
Design, build, and operate secure, scalable CI/CD and infrastructure platforms for production AI and ML workloads. Collaborate with ML, MLOps, and Data Engineering teams to enable reliable model training, deployment, and scaling. Ensure infrastructure and deployment pipelines meet 99.9 percent uptime and reliability targets.
Key Highlights
Technical Skills Required
Benefits & Perks
Job Description
Role Overview
We are seeking a Senior DevOps Engineer to design, build, and operate secure, scalable CI/CD and infrastructure platforms that support production AI and ML workloads. This role enables Machine Learning Engineers, MLOps Engineers, and Data Engineers by ensuring reliable deployment, monitoring, and operation of mission-critical systems.
The ideal candidate is hands-on, comfortable owning production infrastructure, and experienced operating in security-conscious and compliance-driven environments.
Key Responsibilities
DevOps and Platform Engineering
· Design, implement, and maintain automated CI/CD pipelines supporting application, data, and ML deployments.
· Build, operate, and scale Kubernetes-based platforms for containerized workloads.
· Manage and optimize Linux-based production systems to ensure performance, reliability, and scalability.
· Implement infrastructure as code using tools such as Terraform or equivalent frameworks.
· Support infrastructure for data and ML platforms handling TB to PB-scale datasets.
Reliability, Monitoring, and Security
· Monitor system health, performance, and availability using tools such as Prometheus, Grafana, and centralized logging solutions.
· Ensure infrastructure and deployment pipelines meet 99.9 percent uptime and reliability targets.
· Partner with security and compliance teams to support alignment with standards such as NIST 800-53 and FedRAMP.
· Troubleshoot and resolve infrastructure, deployment, and performance issues in production environments.
Collaboration and Enablement
· Work closely with ML, MLOps, and Data Engineering teams to enable reliable model training, deployment, and scaling.
· Support development teams with platform tooling, documentation, and operational best practices.
· Contribute to operational runbooks, system documentation, and continuous improvement initiatives.
Required Qualifications
· U.S. Citizen with an active DoD, IC, or DHS clearance, or eligibility to obtain and maintain one.
· Bachelor’s degree in Computer Science, Information Technology, or a related field, or equivalent professional experience.
· 5 or more years of DevOps engineering experience supporting production environments.
· 5 or more years of Linux system administration experience, including performance tuning and troubleshooting.
· Hands-on experience with Kubernetes and Docker in production environments.
· Experience deploying and managing infrastructure in Azure, AWS, or GCP, with Azure experience strongly preferred.
· Proficiency with scripting languages such as Bash or Python.
Preferred Qualifications
· Experience supporting AI or ML workloads in production environments.
· Experience operating in federal, defense, healthcare, or other regulated environments.
· Familiarity with monitoring and logging stacks such as Prometheus, Grafana, and ELK.
· Experience with infrastructure as code tools such as Terraform.
· Hands-on experience with bare metal provisioning or hybrid infrastructure environments.
· Kubernetes or cloud certifications such as Certified Kubernetes Administrator, AWS Certified DevOps Engineer Professional, or Microsoft Certified Azure DevOps Engineer Expert.
Benefits and Growth
· Competitive salary and comprehensive health benefits.
· 401(k) with company matching.
· Clearance sponsorship for eligible candidates.
· Training and certification support for DevOps, Kubernetes, and cloud platforms.
· Opportunity to grow into Lead DevOps or Platform Engineering roles as programs expand.
Similar Jobs
Explore other opportunities that match your interests
Bright Vision Technologies
Bright Vision Technologies