Sam Altman has introduced OpenAI’s latest creation Sora which is capable of creating one-minute-long videos from text prompts.
After stunning the world with its sensational AI chatbot ChatGPT, OpenAI is back with yet another creation.The Sam Altman-led AI start-up has introduced a new software that can create hyper realistic one-minute videos based on text prompts. Called Sora, the software is currently in the red teaming phase,where the company is working towards identifying flaws in the system. OpenAI is also reportedly working with visual artists, designers, and filmmakers to gather feedback on the model. Sam Altman, the CEO of OpenAI took to his X account to introduce Sora, the company’s video generation model. Altman went on to share a host of videos on his profile to showcase the efficiency and visual capabilities of the new AI model. While the model is currently in the red teaming phase, OpenAI has not shared any information regarding its wider launch.
What is Sora?
According to OpenAI, Sora is a text-to-video model that generates one-minute-long videos while “maintaining the visual quality and adherence to the user’s prompt.” OpenAI claims that Sora is capable of generating complex scenes with numerous characters with specific types of motion and accurate details of the subject and background. According to the company, the model can understand not only what the user’s prompt, but also be able to comprehend how these things will reflect in the real world. Sora is essentially a diffusion model that is capable of generating entire videos all at once or extending generated videos to make them longer. The model uses a transformer architecture that unlocks superior scaling performance much similar to GPT models. The AI model shows videos and images as collections of smaller units of data which are known as patches. Each of these patches is similar to tokens in GPT. OpenAI stated that Sora is built upon past research conducted for DALL-E and GPT Models. It borrows the recapturing technique from DALL-E 3 which includes generating descriptive captions for visual training data. Apart from generating videos from prompts in natural language, the model is capable of taking an existing image and generating a video from it. According to OpenAI, It will essentially animate the image’s components accurately. It is also capable of extending existing videos by filling in missing frames