Chatbot Icon

Home > Knowledge Base

Sora is OpenAI’s new text-to-video generator: Here’s what we know about the tool

sora AI @aitrich

After stunning the world with its sensational AI chatbot ChatGPT, OpenAI is back with yet another creation.The Sam Altman-led AI start-up has introduced a new software that can create hyper realistic one-minute videos based on text prompts. Called Sora, the software is currently in the red teaming phase,where the company is working towards identifying flaws in the system. OpenAI is also reportedly working with visual artists, designers, and filmmakers to gather feedback on the model.  Sam Altman, the CEO of OpenAI took to his X account to introduce Sora, the company’s video generation model. Altman went on to share a host of videos on his profile to showcase the efficiency and visual capabilities of the new AI model. While the model is currently in the red teaming phase, OpenAI has not shared any information regarding its wider launch.

sora AI

What is Sora?

According to OpenAI, Sora is a text-to-video model that generates one-minute-long videos while “maintaining the visual quality and adherence to the user’s prompt.” OpenAI claims that Sora is capable of generating complex scenes with numerous characters with specific types of motion and accurate details of the subject and background. According to the company, the model can understand not only what the user’s prompt, but also be able to comprehend how these things will reflect in the real world. Sora is essentially a diffusion model that is capable of generating entire videos all at once or extending generated videos to make them longer. The model uses a transformer architecture that unlocks superior scaling performance much similar to GPT models. The AI model shows videos and images as collections of smaller units of data which are known as patches. Each of these patches is similar to tokens in GPT. OpenAI stated that Sora is built upon past research conducted for DALL-E and GPT Models. It borrows the recapturing technique from DALL-E 3 which includes generating descriptive captions for visual training data. Apart from generating videos from prompts in natural language, the model is capable of taking an existing image and generating a video from it. According to OpenAI, It will essentially animate the image’s components accurately. It is also capable of extending existing videos by filling in missing frames

Leave a Reply

Your email address will not be published. Required fields are marked *

Got Something to say? Contact us today

share this article :
Facebook
WhatsApp
Twitter
Student testimonials

Discover the Inside Scoop

-Hear From Students Themselves

Front-end Developer Training Programs at Aitrich Academy

Advanced Training Programs at Aitrich Academy

Congratulations!

Your Journey To Become A Tier-1 Software Engineer Starts Here

We’re honored to be part of your journey. Your details are safe and will only be used to enhance your learning experience.

Congratulations!

Your Journey To Become A Tier-1 Software Engineer Starts Here

EnquireNowNewForm

We’re honored to be part of your journey. Your details are safe and will only be used to enhance your learning experience.

small_c_popup.png

We'll send it directly to your inbox!

Download Brochure

small_c_popup.png

We'll send it directly to your inbox!

Download Brochure

Download Brochure