Twelve Labs Unveils Marengo 2.7: A Breakthrough in Multimodal Video Understanding

PRWeb

Published on : Dec 9, 2024

Twelve Labs, a leader in video understanding technology, has announced Marengo 2.7, its latest state-of-the-art multimodal embedding model. This next-generation model achieves over 15% performance improvement compared to its predecessor, Marengo-2.6. With this advancement, Marengo 2.7 introduces a groundbreaking multi-vector representation approach, allowing it to analyze video content with unprecedented accuracy and depth.

What’s New in Marengo 2.7?

1. Multi-Vector Representation: A First of Its Kind

Unlike traditional models that condense all data into a single vector, Marengo 2.7 uses multiple specialized vectors to analyze different aspects of video content. These vectors independently process:

Visual content: Objects, actions, and appearances.
Motion patterns: Transitions and movement within a scene.
Speech and OCR text: Audio patterns and overlaid text (captions, subtitles, or other visual data).

This novel approach allows Marengo 2.7 to achieve superior video analysis by breaking the raw input into distinct elements rather than compressing everything into one representation.

2. Improved Performance Across Key Metrics

Marengo 2.7 has demonstrated impressive capabilities, including:

90.6% average recall in object search (a 32.6% increase over its predecessor).
93.2% recall in speech search (2.8% higher than specialized speech-to-text systems).

These results highlight the model’s ability to analyze and interpret complex videos effectively, with accurate visual, motion, and audio search capabilities.

Why Multi-Vector Representation Matters

Enhanced Search Capabilities:

With its multi-vector approach, Marengo 2.7 enables users to:

Search for specific brands or logos within videos.
Match images to corresponding video segments.
Locate exact moments in audio, even within complex visual scenes.

Precision in Complex Visual and Audio Environments:

The multi-vector method allows Marengo 2.7 to distinguish nuanced details, like detecting small objects or correlating textual overlays with audio cues.

What the CEO Says

Jae Lee, CEO of Twelve Labs, highlighted the significance of this innovation:

"Twelve Labs continues to push video understanding forward in unprecedented ways...Our groundbreaking model's performance is vastly superior to anything on the market today."

Lee emphasizes how Marengo 2.7 addresses challenges once thought unsolvable, opening new opportunities for users to harness advanced video insights.

Twelve Labs’ Marengo 2.7 is revolutionizing video understanding by introducing multi-vector representations for the first time ever. This approach enables more granular and precise multimodal search capabilities—across vision, motion, and audio—unlocking new possibilities for users in video analysis, content discovery, and AI-powered insights.

With Marengo 2.7, Twelve Labs is setting the standard for innovation in video analysis technology.

Twelve Labs Unveils Marengo 2.7: A Breakthrough in Multimodal Video Understanding

Twelve Labs Unveils Marengo 2.7: A Breakthrough in Multimodal Video Understanding

What’s New in Marengo 2.7?

1. Multi-Vector Representation: A First of Its Kind

2. Improved Performance Across Key Metrics

Why Multi-Vector Representation Matters

Enhanced Search Capabilities:

Precision in Complex Visual and Audio Environments:

What the CEO Says

Our Other Publications

Join our newsletter!