Senior Software Engineer - ML Data Platform

duckduckgoose ai • Netherlands

Visa Sponsorship Relocation

Apply

AI Summary

Design and develop a unified schema and catalog for large image/video datasets, ensuring reproducibility and auditable lineage. Collaborate with the ML team to deliver one-command dataset builds, deterministic splits, and fast sampling tools. Work on technically challenging problems such as dataset lineage, model-family clusters, and reproducible forensic benchmarks.

Key Highlights

Design and develop a unified schema and catalog for large image/video datasets

Collaborate with the ML team to deliver one-command dataset builds

Work on technically challenging problems such as dataset lineage and model-family clusters

Key Responsibilities

Design and develop a unified schema and catalog for large image/video datasets

Collaborate with the ML team to deliver one-command dataset builds

Work on technically challenging problems such as dataset lineage and model-family clusters

Technical Skills Required

Master's in Computer Science, Data Engineering, or a related field Production experience: 5–8+ years building and operating data platforms for large unstructured datasets (images/video) Pipelines & orchestration: Experience with modern schedulers (e.g., Airflow/Prefect) and containerized jobs Storage & formats: Hands-on with object storage (e.g., S3), columnar formats/partitioning, and performance tuning Versioning & lineage: Experience with dataset versioning and reproducibility (e.g., DVC/lakeFS/Delta or equivalents) Quality at scale: Deduplication, schema/label checks, and automated QC gates in CI Security & privacy: IAM, access controls, and privacy-aware workflows suitable for regulated customers

Benefits & Perks

Meaningful equity/virtual shares aligned with company growth

Flexible work: Hybrid (Delft), flexible hours, minimal ceremony, async-first collaboration

Data platform mandate: Real say in stack choices (orchestration, catalog, storage/layout) and time to implement them right

Nice to Have

Streaming & events: Kafka/Kinesis or similar for near-real-time ingestion

Vector search: Experience with embedding stores or similarity search at scale

Synthetic data: Building pipelines to generate/stress-test rare scenarios

Job Description

Senior Software Engineer — ML Data Platform

Location: Delft - the Netherlands (hybrid)

Type: Full-time

Start: ASAP

The internet has entered an era where reality is generatable. We build the infrastructure that helps institutions distinguish real from synthetic — at scale, protecting citizens, enterprises, and governments from synthetic media fraud. Everything you see and hear online can now be manipulated — our job is to make sure people can trust what they see. As part of our forensics platform team, you’ll work on the data backbone that makes large-scale detection possible, from ingestion and versioning to training, evaluation, and production.

You’ll join a small, senior team where your work will have immediate impact, and you’ll have ownership over the systems you build.

You’ll work on technically challenging problems such as:

Building dataset lineage for rapidly evolving generative models
Tracking model-family clusters across synthetic media types
Designing reproducible forensic benchmarks at scale
Managing large-scale image/video datasets with auditable provenance
Creating deterministic dataset builds for research and production environments

What You’ll Drive

Data platform architecture: Define unified schemas, lineage, and dataset versioning for large image/video + context data.

Looking to advance your Development & Programming career with relocation support? Explore Development & Programming Jobs with Relocation Packages that include comprehensive packages to help you move and settle in your new role.

Ingestion at scale: Build reliable pipelines from research repos, APIs, and internal generators; automate connectors and jobs.
Quality & governance: Implement deduplication, validation, health dashboards, and drift/coverage checks with auditable lineage.
Curation & access: Deliver one-command dataset builds, deterministic splits, and fast sampling tools for training/eval.
Performance & cost: Tune S3/object storage layouts, partitioning, and lifecycle policies for speed and spend.
Orchestration & ops: Productionize pipelines with CI/CD, containerization, scheduling/monitoring, and safe rollbacks.
Reliability & operations: Build for simplicity and observability; participate in a planned, compensated support rotation.
Engineering productivity: Create internal tools/CLIs, docs, and templates that make everyone faster.

Must haves

Strong software engineering foundation: Master’s in Computer Science, Data Engineering, or a related field.
Production experience: 5–8+ years building and operating data platforms for large unstructured datasets (images/video).
Data lifecycle ownership: Ingest → validate → catalog → version → sample/serve → monitor.
Pipelines & orchestration: Experience with modern schedulers (e.g., Airflow/Prefect) and containerized jobs.
Storage & formats: Hands-on with object storage (e.g., S3), columnar formats/partitioning, and performance tuning.
Versioning & lineage: Experience with dataset versioning and reproducibility (e.g., DVC/lakeFS/Delta or equivalents).
Quality at scale: Deduplication, schema/label checks, and automated QC gates in CI.
Security & privacy: IAM, access controls, and privacy-aware workflows suitable for regulated customers.
Domain awareness: Familiarity with digital forensics, misinformation threats, or synthetic media — and willingness to deepen expertise.
Flexibility: Comfortable moving between data engineering, infra, and tooling tasks when needed.

Discover our full range of relocation jobs with comprehensive support packages to help you relocate and settle in your new location.

Mindset & delivery: Thrive in a fast-moving environment; proactive problem-solver; ship, measure, simplify.
Communication: Excellent written and verbal skills; explain complex ideas clearly.
Independence: Deliver quality work on time without constant oversight.
Language: Fluent in English.

Nice-to-haves

Streaming & events: Kafka/Kinesis or similar for near-real-time ingestion.
Vector search: Experience with embedding stores or similarity search at scale.
Synthetic data: Building pipelines to generate/stress-test rare scenarios.
Cloud & on-prem: Terraform/CDK, Kubernetes, and hybrid/on-prem data deployments.
FinOps: Cost monitoring and optimization for data workloads.
Technical track record: Strong GitHub, open-source contributions, publications, patents, or public talks.
Leadership: Mentoring and guiding technical direction.
Dutch language: Fluency is a plus.

Key Deliverables (First 90 Days)

A unified schema + catalog with key datasets onboarded, versioned, and reproducibly built via one command.
Automated QC gates (dedup/validation) with a red/amber/green dataset health dashboard and clear lineage.

Interested in relocating to Netherlands? Check out our comprehensive Relocation Jobs in Netherlands page with detailed relocation packages and benefits.

Fast sampling/curation tools for the ML team, plus cost controls (storage layouts, lifecycle policies) in place.
Data migration: Inventory and migrate existing/legacy datasets into the new platform; reformat to the new schema, backfill metadata, validate checksums/lineage, and deprecate legacy paths with a rollback plan.

Compensation & benefits

Own the backbone: Define schemas, lineage, and dataset versioning used across research and production.
Company participation: Meaningful equity/virtual shares aligned with company growth.
Flexible work: Hybrid (Delft), flexible hours, minimal ceremony, async-first collaboration.
Data platform mandate: Real say in stack choices (orchestration, catalog, storage/layout) and time to implement them right.
Repro & auditability: Space to enforce deterministic builds, splits, and traceable lineage—no heroics needed.
Quality culture: Backing to implement dedup, drift/coverage checks, and dataset health dashboards org-wide.
FinOps mindset: Budget and support to balance speed, reliability, and total cost.
Pragmatic on-call: Planned, compensated rotation with automation-first recovery and rollback plans.
Growth path: IC track to Staff/Principal; opportunities to mentor and codify data standards.
Learning budget: Annual budget for courses/books + two data/ML-infra conferences per year.
Home office: Modest stipend for an ergonomic setup; commuting support (public transport or mileage).
Relocation + visa: Visa sponsorship and relocation support for internationals.

Join us and be part of a company committed to creating a more secure and trustworthy digital future. Apply today to become part of our mission-driven team!

Job Overview

Posted Date Mar 25, 2026

Employment Type Full-time

Experience Level Mid-Senior level

Location Netherlands

Category Programming

Company duckduckgoose ai

Mentioned Skills

Similar Jobs

Explore other opportunities that match your interests

Senior Enterprise Architect

Programming

•

3d ago

Visa Sponsorship Relocation Remote

Job Type Internship

Experience Level Mid-Senior level

TNO

Netherlands

Senior Frontend Developer

Programming

•

5d ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Mid-Senior level

Picnic Technologies

Netherlands

PhD Candidate in Heat Stress Resilience in Heritage Buildings using Spectrally Selective Window Films

Programming

•

5d ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Internship

delft university of technology

Netherlands

Senior Software Engineer - ML Data Platform

Key Highlights

Key Responsibilities

Technical Skills Required

Benefits & Perks

Nice to Have

Job Description

Job Overview

Mentioned Skills

Industries

Similar Jobs

Senior Enterprise Architect

TNO

Senior Frontend Developer

Picnic Technologies

PhD Candidate in Heat Stress Resilience in Heritage Buildings using Spectrally Selective Window Films

delft university of technology

Subscribe our newsletter