Data Expert (Analytics + Monitoring + Observability)

runware • United Kingdom

Remote

Apply

AI Summary

Runware is seeking a Data Expert to provide visibility and insights into the performance of their high-performance AI media-creation platform. The ideal candidate will have experience with data analytics, observability, and monitoring, and be able to turn raw metrics into actionable insights.

Key Highlights

Provide visibility and insights into the performance of the Runware platform

Build and maintain E2E inference time tracking and monitor implementation changes

Implement metrics, logs, and traces to improve system observability

Select and maintain tooling for data pipelines and build dashboards for technical and non-technical teams

Technical Skills Required

Prometheus Grafana Datadog OpenTelemetry ELK BigQuery Python FastAPI Node.js

Benefits & Perks

Remote-first collective with flexible hours

Generous paid time off

Meaningful stock options

Family leave

Company retreats

Job Description

About Runware

Runware is building a high-performance, full-stack AI media-creation platform — empowering developers and companies to generate any type of media instantly. As we scale fast and integrate increasingly complex models, we need stronger visibility, analytics, and monitoring across the whole platform stack.

We're looking for a Data Expert (Analytics + Monitoring + Observability) to help us better understand, measure, and optimize how the Runware platform performs at scale — internally and for our clients.

🎯 Mission

Your main goal is to give Runware full visibility over:

End-to-end inference performance
Integration usage and model activity
Errors, delays, bottlenecks, regressions
Internal and client-facing analytics dashboards
Health and performance of production pipelines

You will provide the data insights that allow engineering, ML, backend, DevOps, and leadership to make informed decisions — and to continuously improve performance and reliability.

🧩 What You Will Do

Performance Monitoring & Benchmarking

Build and maintain E2E inference time tracking (global and per-model).
Monitor how implementation changes impact total request latency.
Detect regressions introduced by suboptimal code paths.
Provide automated alerts & historical trends.

Usage & Analytics Reporting

Build dashboards for internal use (engineering, product, leadership).
Provide client-facing usage dashboards (requests, errors, success rate, performance).
Support clients who need visibility to debug their integrations.
Track model-level usage, API endpoints usage, adoption metrics, etc.

Platform Observability

Implement metrics, logs, and traces that help the entire platform scale smoothly.
Work closely with DevOps & backend teams to improve system observability.
Provide insights that guide infra decisions (GPU allocation, autoscaling, caching, batching, etc.).

Data Infrastructure Ownership

Select and maintain tooling (e.g., Prometheus/Grafana, Datadog, OpenTelemetry, ELK, BigQuery, etc.).
Ensure data pipelines are reliable, accessible, and always up-to-date.
Build simple, easy-to-read dashboards for both technical and non-technical teams

Requirements

Must-Have

Strong experience with data analytics, observability, or monitoring
Hands-on with metrics/logging/tracing frameworks (Prometheus, Grafana, Datadog, New Relic, etc.)
Good understanding of backend systems and distributed architectures
Ability to turn raw metrics into actionable insights
Experience building dashboards for internal and external stakeholders
Familiarity with AI model monitoring (latency, throughput, error codes, GPU utilization)

Nice-to-Have

Experience with AI/ML infrastructure, inference pipelines, GPUs
Understanding of Python APIs, FastAPI, or Node environments
Experience working with high-throughput real-time systems
Startup or scale-up experience

What You Bring

A problem-solver mindset
Proactivity — you like digging into the data and flagging problems before anyone else sees them
Ability to work with ML, backend, DevOps, and product teams
Comfort with autonomous ownership

You help Runware go from "it works" to "we know exactly how well it works — and how to make it better."

Benefits

We're a remote-first collective, meeting in person twice a year to plan, brainstorm, celebrate wins, and enjoy some face-to-face time. We have core hours for cooperative working and calls, but outside of that your calendar is yours. Work the hours that let you perform at your peak while also building a healthy life.

Our release cycles are fast and intense, but they're followed by real downtime. After big pushes we expect the team to unplug, recharge, and come back ready & stronger than ever for the next leap.

Generous paid time off - vacation, sick days, public holidays
Meaningful stock options - share in the upside you create
Remote-first setup - work from home anywhere we can employ you
Flexible hours - own your schedule outside core collaboration blocks
Family leave - paid maternity, paternity, and caregiver time
Company retreats - twice-yearly gatherings in inspiring locations

Please note: We are unable to offer visa sponsorship in the UK at this time. Candidates must have existing right to work in the UK.

Job Overview

Posted Date Nov 29, 2025

Employment Type Full-time

Experience Level Mid-Senior level

Location United Kingdom

Category Data Science

Company runware

Data Expert (Analytics + Monitoring + Observability)

Key Highlights

Technical Skills Required

Benefits & Perks

Job Description

Job Overview

Mentioned Skills

Industries

Data Expert (Analytics + Monitoring + Observability)

Key Highlights

Technical Skills Required

Benefits & Perks

Job Description

Job Overview

Mentioned Skills

Industries

Subscribe our newsletter