HPC Kubernetes Solutions Architect

Coda Search│Staffing • United State

Relocation

Apply

AI Summary

Design and implement high-performance computing solutions using Kubernetes and GPU acceleration. Collaborate with customers to capture workload requirements and translate them into actionable architectures. Develop scalable, secure, and resilient Kubernetes-based architectures for HPC and AI/ML use cases.

Key Highlights

Act as primary architectural point of contact for customers adopting GPU-accelerated Kubernetes platforms for HPC and AI/ML workloads

Partner with customers to capture workload requirements and translate them into reference architectures and actionable solution designs

Develop scalable, secure, and resilient Kubernetes-based architectures for HPC and AI/ML use cases

Key Responsibilities

Act as primary architectural point of contact for customers adopting GPU-accelerated Kubernetes platforms for HPC and AI/ML workloads

Partner with customers to capture workload requirements and translate them into reference architectures and actionable solution designs

Develop scalable, secure, and resilient Kubernetes-based architectures for HPC and AI/ML use cases

Architect and operate Kubernetes clusters optimized for GPU workloads

Integrate and tune Multi-Instance GPU (MIG), GPU sharing, and scheduler extensions

Develop or extend custom Kubernetes operators and controllers in Go/Python

Design and recommend secure multi-tenant Kubernetes environments

Lead proof-of-concept and benchmarking engagements

Define and document integration strategies across compute, storage, networking, and orchestration layers

Drive observability and monitoring solutions with Prometheus, Grafana, DCGM Exporter, and OpenTelemetry

Support GitOps-driven CI/CD pipelines for Kubernetes infrastructure

Collaborate with HPC, ML, and DevOps teams to validate performance and scalability in hybrid or on-premise environments

Provide architectural leadership during onboarding and deployment

Build and maintain strategic relationships with ecosystem vendors

Technical Skills Required

Kubernetes architecture and operations for HPC or GPU-intensive environments NVIDIA GPU stack (GPU Operator, device plugins, MIG, NVML, DCGM) Kubernetes internals (CRDs, RBAC, scheduler extensions, custom operators/controllers) Distributed and parallel storage integration with Kubernetes for HPC workloads High-performance networking (InfiniBand, RDMA, RoCE) in containerized environments Go or Python for Kubernetes operator or controller development Workload profiling, benchmarking, and performance tuning

Benefits & Perks

Hybrid role in Dallas, TX with relocation available

Nice to Have

Demonstrated success in end-to-end customer solution delivery

Familiarity with containerized HPC environments

Exposure to automation and GitOps practices for Kubernetes platform management

Contributions to open-source projects in the Kubernetes or NVIDIA ecosystem

Experience advising on future adoption strategies

Job Description

One of the fastest-growing companies in high performance compute is building a new Solutions Architecture group. And they are looking for a HPC Kubernetes Solution Architet who can help define what next-generation compute looks like.

As an HPC Kubernetes Solutions Architect, you will act as a trusted advisor to customers, guiding them through the design, integration, and adoption of GPU-accelerated Kubernetes platforms purpose-built for high-performance computing (HPC), AI/ML training, simulation, and scientific workloads.

This is a customer-facing architecture role with accountability across the entire solution lifecycle, from early discovery and requirements analysis, through reference architecture design, proof-of-concept delivery, and deployment, to long-term optimization and platform evolution.

You will be responsible for creating architectural blueprints and integration strategies that enable customers to achieve measurable performance and scalability outcomes, while preparing them for future growth and technology shifts. In addition, you will collaborate closely with product, engineering, and operations teams, ensuring customer feedback informs roadmap priorities and helping define the next generation of Kubernetes-based HPC orchestration.

This role is ideal for someone who combines deep technical expertise in Kubernetes and GPU orchestration with the ability to engage customers as a solution strategist, aligning today’s workloads with tomorrow’s innovation.

Responsibilities:

Act as the primary architectural point of contact for customers adopting GPU-accelerated Kubernetes platforms for HPC and AI/ML workloads.
Partner with customers to capture workload requirements, performance objectives, scaling needs, and integration constraints, translating them into reference architectures and actionable solution designs.
Architect and operate Kubernetes clusters optimized for GPU workloads, leveraging NVIDIA GPU Operator, Network Operator, DCGM, and device plugins.

Looking to advance your Devops career with relocation support? Explore Devops Jobs with Relocation Packages that include comprehensive packages to help you move and settle in your new role.

Integrate and tune Multi-Instance GPU (MIG), GPU sharing, and scheduler extensions (e.g., Volcano, Slurm integration, kube-scheduler plugins) to maximize efficiency in multi-tenant environments.
Develop or extend custom Kubernetes operators and controllers in Go/Python to automate HPC infrastructure services.
Design and recommend secure multi-tenant Kubernetes environments, implementing RBAC, OPA/Gatekeeper policies, namespace isolation, and workload quotas.
Lead proof-of-concept and benchmarking engagements, using profiling tools, workload characterization, and telemetry to validate solution performance and scalability.
Define and document integration strategies across compute, storage, networking, and orchestration layers, including CNI plugins (NVIDIA CNI, Multus, Cilium), storage systems (Lustre, GPFS, Ceph, VAST), and container runtimes (containerd, NVIDIA Container Toolkit).
Drive observability and monitoring solutions with Prometheus, Grafana, DCGM Exporter, and OpenTelemetry, ensuring visibility into GPU health, cluster utilization, and workload performance.
Support GitOps-driven CI/CD pipelines for Kubernetes infrastructure using ArgoCD, FluxCD, Helm, and Kustomize.
Collaborate with HPC, ML, and DevOps teams to validate performance and scalability in hybrid or on-premise environments.
Provide architectural leadership during onboarding and deployment, ensuring successful integration of Kubernetes clusters with HPC schedulers and enterprise IT systems.
Build and maintain strategic relationships with ecosystem vendors (e.g., NVIDIA, Cisco, storage partners), incorporating emerging technologies into customer environments.
Share future insights with customers on GPU roadmaps, interconnect advancements (e.g., InfiniBand, RoCE, NVLink), and container orchestration trends.
Represent the organization in customer design sessions, technical workshops, and industry conferences, positioning yourself as a thought leader in Kubernetes for HPC.

Discover our full range of relocation jobs with comprehensive support packages to help you relocate and settle in your new location.

Required Skills:

Extensive experience in Kubernetes architecture and operations for HPC or GPU-intensive environments.
Strong technical expertise in:
NVIDIA GPU stack (GPU Operator, device plugins, MIG, NVML, DCGM).
Kubernetes internals (CRDs, RBAC, scheduler extensions, custom operators/controllers).
Distributed and parallel storage integration with Kubernetes for HPC workloads.
High-performance networking (InfiniBand, RDMA, RoCE) in containerized environments.
Proven ability to design scalable, secure, and resilient Kubernetes-based architectures for HPC and AI/ML use cases.
Proficiency in Go or Python for Kubernetes operator or controller development.
Experience with workload profiling, benchmarking, and performance tuning.
Strong customer engagement skills, capable of translating requirements into actionable architectures and presenting solutions effectively.

Interested in relocating to United State? Check out our comprehensive Relocation Jobs in United State page with detailed relocation packages and benefits.

Collaborative mindset with experience working across engineering, product, and operations teams.

Preferred Experience:

Demonstrated success in end-to-end customer solution delivery, from requirements discovery to deployment and adoption.
Familiarity with containerized HPC environments (e.g., Singularity/Apptainer).
Exposure to automation and GitOps practices for Kubernetes platform management (e.g., ArgoCD, FluxCD).
Contributions to open-source projects in the Kubernetes or NVIDIA ecosystem.
Experience advising on future adoption strategies, helping customers prepare for emerging GPU, interconnect, and orchestration technologies.
Bachelor’s or Master’s degree in Computer Science, Engineering, Physics, or related technical field.
Relevant Kubernetes and container certifications such as CKA, CKAD, or CKS, alongside cloud certifications like AWS Solutions Architect or Azure Solutions Architect Expert.

This is a hybrid role in Dallas, TX and relocation is available

Job Overview

Posted Date Mar 05, 2026

Employment Type Full-time

Experience Level Not Applicable

Location United State

Category Devops

Company Coda Search│Staffing

Mentioned Skills

Similar Jobs

Explore other opportunities that match your interests

Cloud Developer (Cybersecurity Focus) - Full-Time, Onsite

Devops

•

3h ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Mid-Senior level

medilinkers llc

United State

Information Services Director

Devops

•

12h ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Director

montrose regional health

United State

Cloud Engineer

Devops

•

17h ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Entry level

hirenza

United State

HPC Kubernetes Solutions Architect

Key Highlights

Key Responsibilities

Technical Skills Required

Benefits & Perks

Nice to Have

Job Description

Job Overview

Mentioned Skills

Industries

Similar Jobs

Cloud Developer (Cybersecurity Focus) - Full-Time, Onsite

medilinkers llc

Information Services Director

montrose regional health

Cloud Engineer

hirenza

Subscribe our newsletter