Until now, AI videos have been silent movies. But that is about to change: Pika Labs is introducing a new generative audio model.
Pika Labs has introduced text prompt-based sound effect generation for its generative AI videos. With this feature, users can add simple sounds to their videos: sizzling bacon, screeching eagles, or roaring engines.
The audio generation is still independent of the video content, so the text prompt that guides the video generation also generates the audio, which is then overlaid on the video. Alternatively, you can enter specific audio prompts to better control the generation. Pika Labs says it has trained its own model for audio generation.
Future models could perform audio and video generation in a single step, automatically generating the sounds that match the video based on an analysis of individual video frames, and inserting them in the right places. In large language models such as GPT-4, visual comprehension is good enough for detailed image descriptions.
The new feature is currently only available to subscribers on the Pro plan, but will be rolled out on a larger scale soon. Just a few days ago, Pika Labs introduced a lip-syncing tool that allows users to add lip-synced voices to characters in AI-generated videos.
Pika Labs is an AI startup focused on generative AI for video. It was founded by Stanford graduate students Demi Guoa and Chenlin Meng. With their product, the Pika Video Generator, users can create and edit videos in various styles such as 3D animation, anime, cartoon, and film.
Pika Labs offers text-to-video and image-to-video generation. With the latter, existing images can be animated. The startup has raised about $55 million in its first rounds of funding.