Video AI startup RunwayML introduces two new features for its video generator. The company is also aiming higher with its long-term world model research project.
With "Text-to-Speech", Runway implements synthetic voices in the video editor. The company offers different voices to choose from that follow certain characteristics such as young, mature, female, male, etc. This feature is available on all plans.
Another new feature is the Ratio function, which allows you to convert a created video into different formats, such as 1:1 or 16:9, with a single click. This makes it easier to create videos for different channels.
General world models for better videos - and beyond
Runway also announced a new research initiative: The company wants to develop what it calls "world models." World models are intended to advance AI through systems that can understand and simulate the visual world.
A world model is an AI system that develops an internal representation of an environment to simulate future events in that environment. The goal of a general world model is to map and simulate real-world situations and interactions.
An example of such a model is Wayve's GAIA-1, which was developed from visual and textual data to control autonomous vehicles based on an understanding of the environment. However, this scenario is limited and controlled.
A video model like Gen-2 can be considered a "very early and limited" world model because it has developed a basic understanding of physics and motion for video generation, Runway writes. However, according to the company, it is still limited in its capabilities and has problems with complex camera or object motion.
Runway is currently working on several research challenges, including developing models that can produce consistent maps of the environment and realistic models of human behavior.
Meta's head of AI research, Yann LeCun, agrees that AI first needs a world model and a basic understanding of the world to make significant progress. Language, as in today's large language models, is not sufficient as a knowledge base to achieve human-like AI.
The Runway research project, which is based on multimodal training, i.e. text, audio, image, video and other data points, is moving in a similar direction as multimodal becomes the new norm in AI mode development.