Senior ML Ops Engineer (Databricks, Azure)

Hays Poland
Remote
Apply
AI Summary

Design and implement infrastructure for hosting, orchestrating, and managing ML scoring processes in a Databricks environment. Collaborate with data scientists and DevOps engineers to ensure operational excellence. Develop scalable, secure, and well-monitored platforms for data science teams.

Key Highlights
Design and implement infrastructure for hosting, orchestrating, and managing ML scoring processes
Collaborate with data scientists and DevOps engineers to ensure operational excellence
Develop scalable, secure, and well-monitored platforms for data science teams
Technical Skills Required
Databricks Azure Python Terraform MLflow Delta Lake Azure DevOps GitHub Actions Datadog Prometheus Grafana
Benefits & Perks
Up to PLN 135/hour net + VAT
100% remote or hybrid work option
Luxmed, MultiSport, equipment provided by the client

Job Description


The client offers comprehensive IT services throughout Europe


Basic information:

Location: 100% remote or hybrid (Warsaw/Poznań/Lublin)

Rate: up to approx. PLN 135/hour net + VAT (additional payment for on-call time per week/month)

Type of employment: B2B contract

Duration: 12 months + extensions

Recruitment process: 2 stages

English: B2/C1

Luxmed, MultiSport, equipment provided by the client


Customer Description

- Our client is a leading German company at the forefront of telecommunications and IT services, specializing in advanced solutions such as web hosting, cloud computing, and internet services

- To expand its capabilities, the client is establishing a European Technology Center in Poland.


Project Description

- We are looking for an experienced ML Ops Engineer to design and implement infrastructure for hosting, orchestrating, and managing up to 1,500 ML scoring processes within a new Databricks environment.

- The role focuses on operationalizing ML scoring pipelines by creating a scalable, secure, and well-monitored platform for data science teams to deploy models efficiently.

- You will build the operational backbone that enables data scientists to run, monitor, and manage high-volume ML scoring in production.

- Working closely with DevOps Engineers, you will ensure Databricks supports both AI workloads and traditional BI/data processing use cases, including secure access, seamless integration with downstream tools, and optimized data pipelines.


Impact of the Role

- This position is critical for scaling AI capabilities across the organization, enabling thousands of predictive scores to be calculated and monitored daily in a production-grade environment.

- Your work will accelerate AI adoption while ensuring operational excellence.


Working Model & Collaboration

- Distributed delivery model aligned with the central AI/BI team in Germany

- Daily use of remote collaboration tools (MS Teams, Jira, Confluence)

- Agile methodologies (Scrum/Kanban) in cross-functional squad

- Clear documentation and reproducibility for seamless handovers


Key Responsibilities

- Environment Setup & Configuration

> Configure Databricks clusters, jobs, and workflows for large-scale ML scoring

> Implement Infrastructure as Code (e.g., Terraform) for reproducibility and governance

> Optimize job scheduling, parallel execution, and resource allocation

> Integrate monitoring and alerting using cloud-native tools

> Ensure security, compliance, and cost-efficiency


- ML Ops Pipeline Integration:

> Develop deployment processes for ML models using Databricks MLflow or equivalent

> Implement version control for models, scoring code, and configurations


- Execution Management:

> Build frameworks to orchestrate scoring for 1,500+ ML models/jobs

> Ensure resilience, fault tolerance, and restart capabilities


- Monitoring & Observability:

> Integrate logging, alerting, and dashboards for throughput, latency, and failures

> Establish model performance monitoring hooks


- Automation:

> Collaborate with DevOps Engineers for shared infrastructure (e.g., Delta Lake tables)

> Automate resource provisioning and deployments via CI/CD pipelines

> Utilize IaC for reproducibility


- Collaboration & Governance:

> Work closely with data scientists, architects, and platform engineers

> Define operational SLAs for scoring workloads

> Implement RBAC, credential management, and audit logging

> Ensure secure handling of model artifacts and scoring data


Technology Stack:

> Databricks (administration, ideally on Azure)

> Terraform

> Python (automation)


Key Skills:

> Azure

> Databricks

> MLOps

> Python

> Terraform


Must-Have Requirements:

> Proven experience in ML Ops within production environments

> Hands-on expertise with Databricks (MLflow, Jobs, Workflows, Delta Lake)

> Large-scale batch job orchestration and distributed computing experience

> Strong Python skills for workflow scripting and pipeline integration

> CI/CD pipeline experience (Azure DevOps, GitHub Actions, etc.)

> Proficiency with monitoring tools (Datadog, Prometheus, Grafana, or cloud-native)

> Knowledge of IaC and cloud automation

> Understanding of model lifecycle management and reproducibility


Nice-to-Have:

> Experience with high-volume ML scoring in Databricks

> Familiarity with ML operationalization best practices in regulated environments

> Knowledge of job queueing systems and parallel execution patterns

> Exposure to Azure Databricks and Azure ecosystem

> Performance tuning for large concurrent workloads

> Cost optimization strategies for ML infrastructure


Hays Poland sp. z o.o. is an employment agency registered in a registry kept by Marshal of the Mazowieckie Voivodeship under the number 361


Subscribe our newsletter

New Things Will Always Update Regularly