Sora: A powerful text-to-video AI model

OpenAI’s new creation has the potential to disrupt several industries

2 min readFeb 22, 2024

Image of bear holding a game pad — Image by Willi-van-de-Winkel from Pixabay

Sora is a new generative AI framework developed by OpenAI, which creates short videos based on text prompts. Sora combines features of text and image generating tools using a “diffusion transformer model” and handles coherence and consistency between frames in videos by using tokens representing small patches of space and time.

Compared to other text-to-video models like Google’s Lumiere, Sora appears more powerful, generating higher-resolution videos of up to 60 seconds with multiple shots and dynamic interactions between elements. However, both Sora and Lumiere may suffer from inconsistencies and “hallucinations” in their generated videos.

The potential applications of Sora include prototyping, entertainment, advertising, and education. OpenAI suggests that larger versions of video generators like Sora could serve as simulators for various scientific experiments, although achieving this level of simulation remains challenging.

Sora: A powerful text-to-video AI model

OpenAI’s new creation has the potential to disrupt several industries

Written by Phil Siarri