Senior Machine Learning Engineer for Video and Audio Event Detection

5crest Pakistan
Remote
Apply
AI Summary

Design, build, and improve systems that detect structured events from video and audio streams in controlled environments. Key responsibilities include event detection pipeline, audio-based event detection, and multimodal fusion. Strong Python skills and experience with PyTorch are required.

Key Highlights
Event Detection Pipeline
Audio-Based Event Detection
Multimodal Fusion
Key Responsibilities
Event Detection Pipeline
Audio-Based Event Detection
Multimodal Fusion
Training & Evaluation
Model Lifecycle
Documentation
Technical Skills Required
Python PyTorch YOLO Transformer Whisper TensorRT ONNX Quantization
Benefits & Perks
Fully remote, globally distributed team
Opportunity to own and shape a core ML pipeline
Competitive compensation, benefits, and equity
Nice to Have
Experience in healthcare or other high-stakes, real-time systems
Familiarity with edge deployment and/or cloud ML
Experience with privacy-aware ML and data handling

Job Description


ML Engineer — Video & Audio → Text Event Detection

Location: Remote

Level: Mid to Senior

Reports to: Engineering Lead / Head of ML

We are open to hiring only candidates who are in Oakistan for this role.

About the Company

We are an early-stage company building machine learning–powered visibility for time-sensitive, high-stakes environments. Our platform leverages video and audio from fixed cameras to detect structured workflow events, enabling real-time coordination and insights—while maintaining a strong focus on privacy and compliance.

⚠️ Important Requirement (Please Read Before Applying)

We are specifically looking for professionals who can demonstrate and present their work.

All candidates must be able to:

  • Showcase real-world projects or systems they have built or contributed to
  • Clearly explain their role, decisions, and impact
  • Demonstrate high professional standards in their current or previous positions

Applications without demonstrable work or the ability to present it will not be considered.

Role Summary

As an ML Engineer, you will design, build, and improve systems that detect structured events from video and audio streams in controlled environments. You will work across computer vision, speech-to-text pipelines, and multimodal ML systems.

Key Responsibilities

Event Detection Pipeline

  • Build and optimize object detection systems (e.g., YOLO-based models)
  • Develop temporal models (e.g., transformer-based) for event classification
  • Optimize inference for edge (e.g., Jetson) and cloud environments

Audio-Based Event Detection

  • Implement speech-to-text pipelines (e.g., Whisper)
  • Detect protocol or safety-related events using keyword/phrase recognition
  • Ensure anonymization and timestamp accuracy for downstream use

Multimodal Fusion

  • Combine video and audio signals for improved detection accuracy
  • Define fusion strategies and confidence calibration

Training & Evaluation

  • Design annotation strategies and leverage active learning
  • Define and track key metrics (accuracy, F1, false positives, temporal precision)

Model Lifecycle

  • Manage model versioning, training, and deployment
  • Support A/B testing, monitoring, and rollback strategies

Documentation

  • Maintain clear documentation for models, experiments, and design decisions
Required Qualifications
  • Bachelor’s degree (or equivalent experience) in a relevant technical field
  • 3+ years of hands-on experience in at least two of the following:
  • Computer vision (object detection, tracking, activity recognition)
  • Speech recognition or NLP for event detection
  • Multimodal ML systems
  • Strong Python skills and experience with PyTorch (or similar frameworks)
  • Experience with inference optimization (TensorRT, ONNX, quantization)
  • Experience building and evaluating ML training pipelines
  • Ability to work from structured requirements and iterate with stakeholders
  • Strong communication skills in a collaborative, remote environment
Preferred Qualifications
  • Experience in healthcare or other high-stakes, real-time systems
  • Familiarity with edge deployment (e.g., NVIDIA Jetson) and/or cloud ML (e.g., AWS)
  • Experience with privacy-aware ML and data handling
  • Knowledge of multi-object tracking (e.g., ByteTrack, BoT-SORT)
  • Experience with Whisper-based pipelines and voice activity detection
Nice to Have
  • Exposure to clinical or regulated environments
  • Experience with structured workflows and event sequencing
  • Interest in explainability and confidence calibration
  • Experience working in distributed, remote teams
What We Offer
  • Fully remote, globally distributed team
  • Opportunity to own and shape a core ML pipeline
  • Work on meaningful, real-world ML applications in high-impact environments
  • Collaborative and fast-moving engineering culture
  • Competitive compensation, benefits, and equity (based on experience)


  • Similar Jobs

    Explore other opportunities that match your interests

    Senior Machine Learning Engineer

    Machine Learning
    3w ago
    Visa Sponsorship Relocation Remote
    Job Type Contract
    Experience Level Not Applicable

    elevate recruitment

    Pakistan
    Visa Sponsorship Relocation Remote
    Job Type Contract
    Experience Level Not Applicable

    keystone recruitment

    Australia
    Visa Sponsorship Relocation Remote
    Job Type Part-time
    Experience Level Internship

    gemmo ai

    Italy

    Subscribe our newsletter

    New Things Will Always Update Regularly