Lead the design, deployment, and optimization of scalable machine learning infrastructure. Collaborate with data scientists, software engineers, and platform teams to define and drive MLOps best practices. Automate model versioning, rollback strategies, performance tracking, and monitoring.
Key Highlights
Technical Skills Required
Benefits & Perks
Job Description
About The Company
Bydrec, Inc. is a California-based company that connects top Tech talent from Latin America with U.S. companies looking to expand their development teams. Learn more at bydrec.com.
About The Role
We are looking for a highly experienced Principal MLOps Engineer to lead the design, deployment, and optimization of scalable machine learning infrastructure. This role is critical to ensuring ML models move reliably from experimentation to production while meeting performance, security, and compliance standards.
You will work closely with data scientists, software engineers, and platform teams to define and drive MLOps best practices, enabling robust and production-ready AI solutions.
This is a fully remote role, open to candidates based in Mexico.
Key Responsibilities
- Design, build, and maintain scalable MLOps pipelines for model training, deployment, monitoring, and lifecycle management.
- Lead the implementation and operation of containerized ML workloads using Kubernetes (K8s).
- Collaborate with data science and engineering teams to productionize machine learning models.
- Automate model versioning, rollback strategies, performance tracking, and monitoring.
- Ensure high availability, security, and compliance of ML infrastructure and workflows.
- Develop and manage infrastructure as code using tools such as Terraform and Helm.
- Define, document, and enforce best practices for model governance, reproducibility, and reliability.
- Bachelor’s degree in Computer Science, Engineering, or a related field (Master’s preferred).
- 5–10 years of experience in MLOps, DevOps, or software engineering roles.
- Strong hands-on experience with Kubernetes and container orchestration.
- Proficiency in Python and Bash scripting.
- Experience working with ML frameworks such as TensorFlow, PyTorch, or Scikit-learn.
- Hands-on experience with at least one major cloud provider (AWS, Azure, or GCP).
- Familiarity with CI/CD pipelines and monitoring/observability tools.
- Experience with Kubeflow, MLflow, or similar MLOps platforms.
- Exposure to data versioning tools such as DVC or LakeFS.
- Knowledge of model explainability, governance, and compliance frameworks.
- Contributions to open-source MLOps or DevOps projects.
- 100% remote work while being part of a U.S.-based team
- Opportunity to work on large-scale, production-grade AI systems
- High-impact, leadership-level role in MLOps and cloud-native environments
- Collaborative and international work culture
Similar Jobs
Explore other opportunities that match your interests
Principal Cloud Practice Architect
Rackspace Technology
Principal Cloud Practice Architect
Rackspace Technology
Principal Cloud Practice Architect