This role involves managing and optimizing large-scale high-performance computing infrastructure, focusing on GPU clusters for AI/ML workloads. The engineer will automate provisioning, ensure system stability, and troubleshoot hardware and network performance issues. The position offers a dynamic, remote, and experimentation-driven environment within the luxury fashion and technology intersection.
Key Highlights
Technical Skills Required
Benefits & Perks
Job Description
About
Backed by some of the most influential names in luxury fashion globally. We blend advanced 3D rendering, AI and VFX techniques to deliver unparalleled shopping experiences for luxury fashion.
Role
We are hiring a Platform Engineer to manage and optimise our next-generation high-performance computing infrastructure. Move beyond standard cloud instances and manage the raw power of bare metal GPU clusters.
Responsibilities:
- Automate the provisioning and lifecycle of high-performance GPU clusters using Terraform and Ansible.
- Maintain the stability and performance of large-scale Linux environments supporting AI/ML training workloads.
- Collaborate with vendors and internal teams to troubleshoot hardware and networking bottlenecks (latency, throughput).
- Implement monitoring solutions (Prometheus/Grafana) to visualise GPU health and cluster efficiency.
- Assist in optimising the stack for containerised workloads (Kubernetes/Docker).
Requirements:
- Strong background in Linux Systems Administration.
- Experience managing Bare Metal servers (on-premise or packet/equinix metal).
- Proficiency in Infrastructure as Code (IaC) tools.
- Nice to have: Exposure to GPUs, InfiniBand, or high-throughput networking (we will train the right candidate).
What working with CATCHES is like:
- Fully remote-first, async-friendly, with optional co-working allowances.
- High-trust, low-bureaucracy environment that values experimentation and shipping.
- Early influence on product, architecture and engineering culture.
- Cutting-edge tech, luxury-fashion creativity, and games-industry scale challenges combined.