Human Review Crucial as ASR Accuracy Plateaus, Finds 3Play Media Report | Martech Edge | Best News on Marketing and Technology
GFG image
Human Review Crucial as ASR Accuracy Plateaus, Finds 3Play Media Report

reports

Human Review Crucial as ASR Accuracy Plateaus, Finds 3Play Media Report

Human Review Crucial as ASR Accuracy Plateaus, Finds 3Play Media Report

Business Wire

Published on : May 27, 2025

Automatic Speech Recognition (ASR) technology continues to advance but has reached a plateau in accuracy improvements for English pre-recorded content. According to the latest State of ASR report by 3Play Media, human review remains essential to meet accessibility standards for captioning and transcription.

Insights from the 3Play Media Report

  • ASR Accuracy Plateau

    • Remarkable progress has been made, but error rates across leading ASR engines still fall short of accessibility requirements.

    • The gap between top-performing engines and others has widened.

  • Study Scope

    • Evaluated 205 hours of diverse audio content, a 30% increase from last year, spanning multiple industries and use cases.

    • Included testing of eight ASR engines and Gemini, a multimodal large language model (LLM).

  • Engine Performance

    • Whisper X showed improved accuracy and avoided hallucinations found in earlier Whisper versions.

    • AssemblyAI’s Universal-2 and Whisper X outperformed Speechmatics, with all three ahead of other tested engines.

  • Industry Variations

    • ASR accuracy varies by industry, highlighting the need for tailored solutions based on content type.

    • Sports content remains most challenging due to noisy environments and complex terminology, with error rates three times higher than top-performing industries.

  • LLMs and Future Trends

    • Large language models are not yet ready to replace dedicated ASR engines for transcription.

    • Future ASR innovation will likely focus on real-time processing and support for non-English languages rather than further improving English pre-recorded content accuracy.

While ASR technologies are becoming more sophisticated, 3Play Media’s report emphasizes the ongoing necessity of human-in-the-loop workflows to ensure captioning and transcription meet accessibility standards. The report also suggests that future ASR developments will shift toward new applications and broader language capabilities.