Showcased by OpenAI: Sora, a state-of-the-art generative video model

One of the most impressive new generative video models that OpenAI has developed is called Sora. It has the ability to take a brief text description and transform it into a high-definition film clip that may be up to one minute in length.

San Francisco-based OpenAI has expanded the boundaries of what is possible with text-to-video production, a hot new research path that we highlighted as a trend to watch in 2024. This is based on four example movies that the company provided with MIT Technology Review before today's release.

An key step for any future artificial intelligence systems, according to Tim Brooks, a scientist at OpenAI, is the construction of models that are capable of comprehending video and comprehending all of these really complicated interactions that occur in our environment.

A caveat. OpenAI previewed Sora (Japanese for sky) under tight secrecy. In an unusual move, the business would only disclose information about Sora if we waited until the model was published to consult independent experts. We included outside remark below. No technical paper or model demonstration has been provided by OpenAI. Says it won't release Sora soon.

Late 2022 saw the first generative models that could make video from text. Early examples from Meta, Google, and Runway were glitchy and grainy. Since then, tech has improved rapidly. Last year, Runway's gen-2 model produced short pieces that rival big-studio animation. However, most cases last barely a few seconds.

OpenAI Sora's example videos are detailed and high-definition. OpenAI claims it can create one-minute films. As the camera follows a couple past a series of stores on a Tokyo street, Sora demonstrates she understands 3D geometry.

Sora is good at handling occlusion, according to OpenAI. Current models have the potential flaw of losing track of things when they occlude the camera. A street sign, for instance, may vanish forever if a truck runs it over.

By adding what appear to be cuts between several parts of footage in a film of a papercraft aquatic scene, Sora has ensured that the model has kept a consistent style throughout the whole presentation.

It's flawed. Left-hand autos in the Tokyo video are smaller than persons strolling beside them. They also move across tree branches. Brooks thinks “there’s definitely some work to be done in terms of long-term coherence”. “If someone disappears for a long time, they won't return. The model almost forgets they're there.”

There is little question that the sample films that are being displayed here were selected with the intention of showcasing Sora in its most impressive form. Because there is a lack of further information, it is difficult to determine how representative they are of the average output of the model.

It can take a while before we discover the answer. OpenAI has said that it does not intend to disclose Sora to the public at this time, and today's presentation is only a tech tease. On the contrary, OpenAI will start releasing the model to external safety testers today.

Specifically, the company is concerned about the possible abuses of synthetic yet lifelike footage. "We're being careful about deployment here and making sure we have all our bases covered before we put this in the hands of the general public," says Aditya Ramesh, a scientist at OpenAI and creator of the firm's text-to-image model DALL-E.

Keep coming back here for the most up-to-date information.