Google has unveiled VideoPrism, a single model capable of handling various video analysis tasks like classification, retrieval, captioning, and question answering.
VideoPrism is pre-trained on a dataset consisting of 36 million video-text pairs and an additional 582 million video clips.
In a demonstration video, Google explained that VideoPrism uses a two-stage training approach. First, it employs contrastive learning to match videos with their text descriptions, including imperfect ones.
Then, it leverages videos without text descriptions using a masked video modelling framework to predict masked patches in a video.
VideoPrism can be combined with large language models for various video-language tasks such as video-text retrieval, captioning, and question-answering.
After completing tests, Google said that VideoPrism achieves acceptable performance on 30 out of 33 video understanding benchmarks.
VideoPrism was tested on datasets used in scientific domains like ethology, behavioural neuroscience, and ecology.
In a statement, Google said the encoder not only performed well but also surpassed models designed specifically for those tasks, indicating its potential for scientific analysis of video data.
How well do you really know your competitors?
Access the most comprehensive Company Profiles on the market, powered by GlobalData. Save hours of research. Gain competitive edge.
Thank you!
Your download email will arrive shortly
Not ready to buy yet? Download a free sample
We are confident about the unique quality of our Company Profiles. However, we want you to make the most beneficial decision for your business, so we offer a free sample that you can download by submitting the below form
By GlobalDataWith this new tool, Google is now one of several Big Tech companies providing content summarisation and detailed research on videos using AI.
OpenAI’s Sora, which launched in February of this year, is a text-to-video platform. However, the software is still a work in progress with multiple weaknesses.
OpenAI said, however, that Sora struggles with accurately simulating the physics of a scene, with a lack of understanding of specific instances of cause and effect.