Meta releases open source audio AI for VR and AR

Meta presents new research on artificial intelligence for realistic metaverse audio. Multimodal trained audio systems evaluate visual information and automatically adjust sound.

Meta's Reality Labs, together with the University of Texas, unveils new AI models designed to optimize sound in VR and AR based on visual data. AI is critical for realistic sound quality in the Metaverse, the company writes.

Multimodal AI for matching sound and image

Meta is releasing three new AI models as open source: visual-acoustic matching, visually-informed dereverberation and visual voice. All three models ultimately involve an AI automatically shaping the sound to match visual information. This multimodal interaction of audio, video, and text is the focus of the newly presented research.

"Existing AI models do a good job understanding images, and are getting better at video understanding. However, if we want to build new, immersive experiences for AR and VR, we need AI models that are multimodal — models that can take audio, video, and text signals all at once and create a much richer understanding of the environment," Meta's research team writes.

For example, if an AI detects that a sound is coming from a cave, it can automatically add appropriate reverberation (visual-acoustic matching). An example of visual-acoustic dereverberation is matching the sound of existing content to the current space, rather than the sound of the space where the content was originally recorded.

For example, the soundscape of a recorded theater performance could be processed as if it were being performed live in the current space during an AR projection. The AI should also be able to automatically remove unwanted background noise from the original soundtrack, according to the researchers.

Better concert experiences in the Metaverse

Another application example, according to Meta, is a virtual concert visit. In the metaverse, avatars could initially hear muffled sounds outside the concert hall, which become increasingly clear the closer they get to the stage.

The metaverse trick: Dialogue could remain clearly audible despite the increasing ambient volume, as if people were standing next to each other without loud background music. AI audio could also focus the audio around small groups, for example, so that voices do not overlap each other (visual voice).

Working together, these audio systems could one day additionally enable "intelligent assistants" to better understand what we are saying to them - even at a loud concert or a wild party.

Recommendation

AI research

Rule-Based Rewards: OpenAI provides insight into the GPT-4 safety stack

Meta is releasing the three AI models as open source. Paper, models, and more information are available on Meta's AI Blog.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Meta releases open source audio AI for VR and AR

Multimodal AI for matching sound and image

Better concert experiences in the Metaverse

Rule-Based Rewards: OpenAI provides insight into the GPT-4 safety stack

Why large AI language models don't lead to human-like AI

Meta PEER: Are large language models any good as writing assistants?

GLM-130B: The most capable AI language model currently available comes from China

Cloudflare CEO Matthew Prince sees trouble ahead for the open web

New Othello experiment supports the world model hypothesis for large language models

ChatGPT might be draining your brain, MIT warns - what ‘cognitive debt’ means for you

Meta releases open source audio AI for VR and AR

Multimodal AI for matching sound and image

Better concert experiences in the Metaverse

Share

Bank details