Senior Cloud Platform Engineer (Azure/AWS) - AI/ML & API Focus

ebp global • Portugal
Remote
Apply
AI Summary

Seeking a skilled Cloud Platform Engineer with Azure and AWS expertise to design, implement, and manage cloud infrastructure. Responsibilities include architecting AI/ML workloads, MLOps pipelines, and robust API solutions. Requires 5+ years of network engineering experience, with 3+ in cloud environments.

Key Highlights
Design and implement scalable and secure network architectures in Azure and AWS.
Architect AI/ML workloads, MLOps pipelines, and API hosting environments.
Requires 5+ years of network engineering experience, with 3+ years focused on cloud environments.
Key Responsibilities
Design and implement scalable and secure network architectures in both Azure and AWS environments.
Develop comprehensive architectural blueprints and documentation for cloud infrastructure.
Plan and execute cloud migration strategies, including hybrid cloud solutions.
Design infrastructure for AI/ML workloads including GPU/TPU compute clusters, high-throughput storage, and low-latency networking between nodes.
Architect MLOps pipelines integrating model training, versioning, and deployment workflows on cloud platforms (e.g., Azure ML, AWS SageMaker).
Deploy and manage virtual networks, subnets, route tables, and network gateways.
Implement and manage VPN connections, Direct Connect (AWS), and ExpressRoute (Azure).
Configure and manage load balancers, firewalls, and security groups.
Oversee DNS setup and management within cloud environments.
Deploy and manage AI-specific services such as AWS SageMaker, Azure Machine Learning, and GPU-enabled VM fleets.
Set up and manage vector databases (e.g., Pinecone, Weaviate, pgvector on RDS) and object storage optimized for large model artifacts.
Configure container orchestration (Kubernetes/EKS/AKS) for scalable model serving and inference endpoints.
Deploy and manage API hosting environments including containerized REST APIs using Docker and Kubernetes (EKS/AKS).
Configure and manage API Gateways (AWS API Gateway, Azure API Management) for routing, throttling, and versioning.
Implement and maintain robust security protocols to safeguard cloud infrastructure.
Conduct regular security audits and compliance checks.
Ensure cloud infrastructure adheres to industry standards and regulatory requirements.
Implement data governance and access controls for sensitive training datasets and model artifacts.
Ensure compliance with AI-specific regulations and responsible AI frameworks (e.g., EU AI Act considerations).
Monitor network performance and implement tuning measures to optimize throughput and latency.
Troubleshoot and resolve network-related issues promptly.
Conduct capacity planning and scaling to accommodate growing workloads.
Optimize inference latency and throughput for deployed models using techniques like auto-scaling endpoints, spot instances, and caching layers.
Monitor GPU utilization, model drift, and endpoint health using tools like CloudWatch, Azure Monitor, or Prometheus.
Develop and maintain Infrastructure as Code (IaC) using tools like Terraform, CloudFormation, or ARM templates.
Automate deployment, configuration, and management tasks using scripting languages such as Python, PowerShell, or Bash.
Build and maintain CI/CD pipelines for model deployment using tools like MLflow, Kubeflow, or Azure DevOps.
Automate model retraining triggers, A/B deployment rollouts, and blue/green model switches.
Experience deploying Python-based REST APIs using frameworks such as FastAPI or Flask.
Build CI/CD pipelines for automated testing, containerization, and deployment of Python APIs to cloud environments.
Support LLM and generative AI deployments including API gateway configuration for models like Azure OpenAI or AWS Bedrock.
Manage prompt caching layers, rate limiting, and cost monitoring for AI API consumption.
Collaborate with data science and AI teams to translate model requirements into scalable cloud infrastructure.
Work closely with development, operations, and security teams to ensure seamless integration and operation of cloud services.
Provide technical guidance and support to junior network engineers and other team members.
Participate in on-call rotation for after-hours support as needed.
Design, deploy, and manage RESTful APIs built in Python (FastAPI, Flask, or Django REST Framework).
Manage full API lifecycle — versioning, documentation (Swagger/OpenAPI), deprecation, and rollout strategies.
Implement API security best practices including OAuth2, API key management, rate limiting, and JWT authentication.
Monitor API performance, uptime, and error rates using tools like CloudWatch, Azure Monitor, or Datadog.
Manage API monetization or access tiers where applicable, using gateway-level policies.
Technical Skills Required
Azure AWS Terraform CloudFormation ARM templates Python PowerShell Bash MLflow Kubeflow Azure DevOps Docker Kubernetes EKS AKS AWS SageMaker Azure Machine Learning Pinecone Weaviate pgvector AWS API Gateway Azure API Management FastAPI Flask Django REST Framework OAuth2 JWT mTLS API key management OpenAPI Swagger
Benefits & Perks
Remote, flexible working environment
Global team
Direct exposure to senior industry experts
Nice to Have
AWS Certified Solutions Architect – Professional
AWS Certified Advanced Networking – Specialty
AWS Certified Machine Learning – Specialty
Microsoft Certified: Azure AI Engineer Associate
Relevant network certifications (e.g., Cisco CCNA/CCNP)

Job Description


Cloud Platform Engineer (m/f)

šŸ“Ā Portugal |Ā šŸ•’Ā Full-TimeĀ | Remote


Company Description

ebp Global is a high-performing boutique consultancy firm best known for delivering tailored, impactful solutions to our clients’ most complex problems, from conceptualisation to implementation. Our expertise covers a wide range of value chain activities from strategy, organisational design and operating models, through operations and business process optimisation, to information flows and analytics. It is through our hands-on approach, and deep knowledge that we are proud to claim some of the world’s most well-known companies, across a wide variety of industries as long-term client partners.


We are uniquely global, not just operating on a global scale but operating in a global nature, with one another and our clients too. Our team is made up of experts with operational, industry related experience; instilling a true understanding of our client’s problems with a passion to solve and improve.


See https://ebp-global.com/ for further details about our company.


Job Overview

We are seeking a highly skilled and experienced Cloud Platform Engineer with expertise in Azure and AWS to join our dynamic IT team.

The ideal candidate will be responsible for designing, implementing, and managing our cloud architecture and infrastructure, ensuring the highest levels of availability, performance, and security. Overall, you’ll strive for efficiency by aligning cloud systems with business goals.


You are required to work closely with colleagues to effectively gather and translate requirements into solutions. Contribute to the delivery of robust, supportable and sustainable infrastructure solutions in accordance with agreed organisational standards that ensure services are resilient, scalable and future proof.


A self-starter with an inquisitive nature and would want to look beyond the obvious to explore why things are there. Critical and conceptual thinking and problem-solving skills are essential alongside passion for networking.


Job Responsibilities


  • Design and Architecture:

- Design and implement scalable and secure network architectures in both Azure and AWS environments. -- Develop comprehensive architectural blueprints and documentation for cloud infrastructure.

- Plan and execute cloud migration strategies, including hybrid cloud solutions.

- Design infrastructure for AI/ML workloads including GPU/TPU compute clusters, high-throughput storage, and low-latency networking between nodes

- Architect MLOps pipelines integrating model training, versioning, and deployment workflows on cloud platforms (e.g., Azure ML, AWS SageMaker)


  • Infrastructure Setup and Management:

- Deploy and manage virtual networks, subnets, route tables, and network gateways.

- Implement and manage VPN connections, Direct Connect (AWS), and ExpressRoute (Azure).

- Configure and manage load balancers, firewalls, and security groups.

- Oversee DNS setup and management within cloud environments.

- Deploy and manage AI-specific services such as AWS SageMaker, Azure Machine Learning, and GPU-enabled VM fleets

- Set up and manage vector databases (e.g., Pinecone, Weaviate, pgvector on RDS) and object storage optimized for large model artifacts

- Configure container orchestration (Kubernetes/EKS/AKS) for scalable model serving and inference endpoints

- Deploy and manage API hosting environments including containerized REST APIs using Docker and Kubernetes (EKS/AKS)

- Configure and manage API Gateways (AWS API Gateway, Azure API Management) for routing, throttling, and versioning


  • Security and Compliance:

- Implement and maintain robust security protocols to safeguard cloud infrastructure.

- Conduct regular security audits and compliance checks.

- Ensure cloud infrastructure adheres to industry standards and regulatory requirements.

- Implement data governance and access controls for sensitive training datasets and model artifacts

- Ensure compliance with AI-specific regulations and responsible AI frameworks (e.g., EU AI Act considerations)


  • Performance Optimization:

- Monitor network performance and implement tuning measures to optimize throughput and latency.

- Troubleshoot and resolve network-related issues promptly.

- Conduct capacity planning and scaling to accommodate growing workloads.

- Optimize inference latency and throughput for deployed models using techniques like auto-scaling endpoints, spot instances, and caching layers

- Monitor GPU utilization, model drift, and endpoint health using tools like CloudWatch, Azure Monitor, or Prometheus


  • Automation and Scripting:

- Develop and maintain Infrastructure as Code (IaC) using tools like Terraform, CloudFormation, or ARM templates.

- Automate deployment, configuration, and management tasks using scripting languages such as Python, PowerShell, or Bash.

- Build and maintain CI/CD pipelines for model deployment using tools like MLflow, Kubeflow, or Azure DevOps

- Automate model retraining triggers, A/B deployment rollouts, and blue/green model switches

- Experience deploying Python-based REST APIs using frameworks such as FastAPI or Flask

- Build CI/CD pipelines for automated testing, containerization, and deployment of Python APIs to cloud environments


  • AI/ML Platform Support:

- Support LLM and generative AI deployments including API gateway configuration for models like Azure OpenAI or AWS Bedrock

- Manage prompt caching layers, rate limiting, and cost monitoring for AI API consumption

- Collaborate with data science and AI teams to translate model requirements into scalable cloud infrastructure


  • Collaboration and Support:

- Work closely with development, operations, and security teams to ensure seamless integration and operation of cloud services.

- Provide technical guidance and support to junior network engineers and other team members.

- Participate in on-call rotation for after-hours support as needed.


  • API Development & Management:

- Design, deploy, and manage RESTful APIs built in Python (FastAPI, Flask, or Django REST Framework)

- Manage full API lifecycle — versioning, documentation (Swagger/OpenAPI), deprecation, and rollout strategies

- Implement API security best practices including OAuth2, API key management, rate limiting, and JWT authentication

- Monitor API performance, uptime, and error rates using tools like CloudWatch, Azure Monitor, or Datadog

- Manage API monetization or access tiers where applicable, using gateway-level policies


Key Skills for a Cloud Platform Engineer

  • Minimum of 5 years of experience in network engineering, with at least 3 years focused on cloud environments.
  • Proven experience designing and managing network infrastructure in both Azure and AWS.

Education:

  • Bachelor's degree in Computer Science, Information Technology, or a related field. Relevant certifications and experience may be considered in lieu of a degree.

Certifications (Preferred):

  • AWS Certified Solutions Architect – Professional or AWS Certified Advanced Networking – Specialty.
  • AWS Certified Machine Learning – Specialty
  • Microsoft Certified: Azure AI Engineer Associate
  • Relevant network certifications (e.g., Cisco CCNA/CCNP).

Technical Skills:

  • Proficiency with Python REST API development and deployment (FastAPI, Flask)
  • Hands-on experience with AWS API Gateway or Azure API Management (APIM)
  • Familiarity with OpenAPI/Swagger specifications and API documentation practices
  • Understanding of API security standards — OAuth2, JWT, mTLS, API key rotation
  • Experience with containerizing APIs using Docker and deploying via Kubernetes or serverless functions (Lambda, Azure Functions)

Soft Skills:

  • Accuracy and attention to detail
  • Problem-solving aptitude is essential
  • Excellent communication and presentation skills
  • Ability to learn and upgrade technical skills, in the fast-paced data analysis field
  • Ability to understand and visualize multidimensionality of business facts/measures
  • Ability to work in a dynamic, agile environment within a geographically distributed team


Why ebp Global?Ā 

  • Boutique, high-expertise consulting firm
  • Remote, flexible working environment
  • Global team
  • Direct exposure to senior industry expertsĀ 
  • Visible impact on company growthĀ 


Please apply by sending your CV (in English) to info@ebp-global.comĀ 


Applicants must reside and have the right to work in Portugal.

Only short-listed candidates will be contacted.Ā 


Personal data collected will be used for recruitment purpose only.Ā 


Similar Jobs

Explore other opportunities that match your interests

Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Mid-Senior level

TMC

Portugal

Cloud Engineer - AWS Infrastructure

Devops
•
3d ago

Premium Job

Sign up is free! Login or Sign up to view full details.

•••••• •••••• ••••••
Job Type ••••••
Experience Level ••••••

FP Markets (First Prudential M...

Portugal

Data Pipeline Engineer

Devops
•
4d ago
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Entry level

bridge351

Portugal

Subscribe our newsletter

New Things Will Always Update Regularly