interface.ai is redefining banking with AI, serving over 100 financial institutions globally. They are building BankGPT, an advanced platform leveraging large language models and multi-agent orchestration. The role involves designing scalable, resilient infrastructure to support AI workloads at scale.
Key Highlights
Technical Skills Required
Benefits & Perks
Job Description
Get To Know Us First!
Who We Are
At interface.ai, we’re redefining the future of banking with AI. Our cutting-edge Generative AI-powered platform serves over 100 banks and credit unions, delivering hyper-personalized customer interactions across voice, chat, and employee-assisting solutions.
Our mission:
To make banking effortless, intelligent, and profitable—enhancing user experience while boosting revenue and efficiency for financial institutions.
We’re not just another AI company. Our proprietary AI, built 100% in-house, is designed for zero-shot learning, achieving 90%+ accuracy on Day 1. With a world-class team from Microsoft, ISB, and IIMs, and a 1,800% growth rate in the last year, we’re shaping the future of AI in financial services.
Join us to build something transformative.
Careers - https://interface.ai/open-positions
LinkedIn - https://www.linkedin.com/company/interface-ai/
Role – DevOps Engineer III
Location: India (Remote)
Function: Engineering – Product Engineering
Level: Senior
Reports to: Engineering Manager – Product Engineering
About the Role
At interface.ai, we are building BankGPT – the world’s first AI-powered digital banking platform that leverages large language models, multi-agent orchestration, real-time streaming, and voice AI. To support this mission, we are seeking a DevOps Engineer III who will own infrastructure end-to-end, design systems from scratch, and enable highly resilient AI workloads at scale.
This is a senior, hands-on role that requires deep expertise in cloud-native DevOps, infrastructure automation, observability, and security. You’ll not only build and optimize systems but also influence best practices, mentor peers, and contribute to critical decision-making around platform reliability and scalability.
What You’ll Do
- Infrastructure Ownership – Design, implement, and scale infra across AWS, GCP, or Azure; drive high availability, multi-AZ, and DR/BCP strategies.
- Cloud-Native Enablement – Build and manage Kubernetes clusters (EKS/GKE), service mesh (Istio/Linkerd), and ingress controllers for secure and resilient workloads.
- CI/CD & Automation – Architect CI/CD pipelines (ArgoCD/GitOps, Jenkins) and build custom deployment portals and automation tools to accelerate developer productivity.
- AI/LLM Reliability – Define and track key metrics (latency, cost, throughput, containment) for AI/LLMs and agent workflows.
- Observability & Tracing – Implement end-to-end tracing for multi-turn queries and real-time pipelines using OpenTelemetry, Prometheus, and Grafana.
- Vector Databases – Manage and tune vector DBs (Pinecone, Weaviate, Milvus, etc.) for high concurrency, hybrid retrieval, reranking, and resilience.
- Resilience & Scaling – Design autoscaling, failover, and health-check–based routing strategies for workloads like WebSockets, RAG pipelines, and voice (STT/TTS).
- Scripting & Tooling – Write Bash/Python/Go scripts for operational tooling, log rotation, API integrations, and rollout automation.
- Collaboration – Partner with AI and engineering teams to support complex workflows, while driving DevOps best practices across the organization
What You’ll Bring
- 5–8 years of core DevOps experience with a strong track record of building infra from scratch (not just maintaining existing systems).
- Deep expertise in Docker, Kubernetes, Helm, and container orchestration.
- Hands-on with Terraform, Crossplane, and declarative infra management.
- Strong experience in CI/CD pipelines (ArgoCD, Jenkins, GitOps workflows) and building custom automation.
- Proven ability to deploy AI/LLMs & agent workflows reliably in production.
- Expertise in defining/tracking AI workflow metrics and observability of multi-turn queries.
- Mandatory expertise with vector databases – tuning, scaling, and optimizing retrieval performance.
- Proficiency in monitoring & logging tools (Prometheus, Grafana, OpenTelemetry, ELK/OpenSearch).
- Familiarity with service mesh (Istio/Linkerd), networking, and multi-cluster workloads.
- Proficiency in scripting/programming (Python, Bash, Go preferred).
- Knowledge of security best practices in cloud environments (IAM, secrets, secure networking).
Bonus Points:
- Experience working on AI-enabled or ML-integrated platforms
- Understanding of compliance, security, and auditability requirements in regulated environments
- Prior experience working in fast-paced, high-growth product teams
Why Join Us?
- Remote-first culture – Work from anywhere, with top-tier colleagues.
- High ownership, high impact – Your work will define the future of banking.
- Comprehensive Benefits – We take care of our people.