Senior Machine Learning Engineer for Video and Audio Event Detection

5crest • Pakistan

Remote

Apply

AI Summary

Design, build, and improve systems that detect structured events from video and audio streams in controlled environments. Key responsibilities include event detection pipeline, audio-based event detection, and multimodal fusion. Strong Python skills and experience with PyTorch are required.

Key Highlights

Event Detection Pipeline

Audio-Based Event Detection

Multimodal Fusion

Key Responsibilities

Event Detection Pipeline

Audio-Based Event Detection

Multimodal Fusion

Training & Evaluation

Model Lifecycle

Documentation

Technical Skills Required

Python PyTorch YOLO Transformer Whisper TensorRT ONNX Quantization

Benefits & Perks

Fully remote, globally distributed team

Opportunity to own and shape a core ML pipeline

Competitive compensation, benefits, and equity

Nice to Have

Experience in healthcare or other high-stakes, real-time systems

Familiarity with edge deployment and/or cloud ML

Experience with privacy-aware ML and data handling

Job Description

ML Engineer — Video & Audio → Text Event Detection

Location: Remote

Level: Mid to Senior

Reports to: Engineering Lead / Head of ML

We are open to hiring only candidates who are in Oakistan for this role.

About the Company

We are an early-stage company building machine learning–powered visibility for time-sensitive, high-stakes environments. Our platform leverages video and audio from fixed cameras to detect structured workflow events, enabling real-time coordination and insights—while maintaining a strong focus on privacy and compliance.

⚠️ Important Requirement (Please Read Before Applying)

We are specifically looking for professionals who can demonstrate and present their work.

All candidates must be able to:

Showcase real-world projects or systems they have built or contributed to
Clearly explain their role, decisions, and impact
Demonstrate high professional standards in their current or previous positions

Applications without demonstrable work or the ability to present it will not be considered.

Role Summary

As an ML Engineer, you will design, build, and improve systems that detect structured events from video and audio streams in controlled environments. You will work across computer vision, speech-to-text pipelines, and multimodal ML systems.

Key Responsibilities

Event Detection Pipeline

Build and optimize object detection systems (e.g., YOLO-based models)
Develop temporal models (e.g., transformer-based) for event classification
Optimize inference for edge (e.g., Jetson) and cloud environments

Audio-Based Event Detection

Implement speech-to-text pipelines (e.g., Whisper)
Detect protocol or safety-related events using keyword/phrase recognition
Ensure anonymization and timestamp accuracy for downstream use

Interested in remote work opportunities in Machine Learning & AI? Discover Machine Learning & AI Remote Jobs featuring exclusive positions from top companies that offer flexible work arrangements.

Multimodal Fusion

Combine video and audio signals for improved detection accuracy
Define fusion strategies and confidence calibration

Training & Evaluation

Design annotation strategies and leverage active learning
Define and track key metrics (accuracy, F1, false positives, temporal precision)

Model Lifecycle

Manage model versioning, training, and deployment
Support A/B testing, monitoring, and rollback strategies

Documentation

Maintain clear documentation for models, experiments, and design decisions

Required Qualifications

Bachelor’s degree (or equivalent experience) in a relevant technical field
3+ years of hands-on experience in at least two of the following:
Computer vision (object detection, tracking, activity recognition)
Speech recognition or NLP for event detection
Multimodal ML systems

Browse our curated collection of remote jobs across all categories and industries, featuring positions from top companies worldwide.

Strong Python skills and experience with PyTorch (or similar frameworks)
Experience with inference optimization (TensorRT, ONNX, quantization)
Experience building and evaluating ML training pipelines
Ability to work from structured requirements and iterate with stakeholders
Strong communication skills in a collaborative, remote environment

Preferred Qualifications

Experience in healthcare or other high-stakes, real-time systems
Familiarity with edge deployment (e.g., NVIDIA Jetson) and/or cloud ML (e.g., AWS)
Experience with privacy-aware ML and data handling
Knowledge of multi-object tracking (e.g., ByteTrack, BoT-SORT)
Experience with Whisper-based pipelines and voice activity detection

Nice to Have

Exposure to clinical or regulated environments
Experience with structured workflows and event sequencing
Interest in explainability and confidence calibration
Experience working in distributed, remote teams

What We Offer

Fully remote, globally distributed team
Opportunity to own and shape a core ML pipeline
Work on meaningful, real-world ML applications in high-impact environments
Collaborative and fast-moving engineering culture
Competitive compensation, benefits, and equity (based on experience)

Job Overview

Posted Date Mar 22, 2026

Employment Type Full-time

Experience Level Mid-Senior level

Location Pakistan

Category Machine Learning

Company 5crest

Mentioned Skills

Similar Jobs

Explore other opportunities that match your interests