Content
summary Summary

Meta's Fundamental AI Research (FAIR) team has unveiled new models, including image-to-text, text-to-music, multi-token prediction and a technique for watermarking AI-generated speech.

Ad

Meta has released some of its latest AI models. These include Chameleon, a multimodal model that can process and generate both images and text, a multi-token prediction model for more efficient language training, and JASCO, a model for generating music from text and other inputs such as chords or beats.

Chameleon was presented in May. Unlike most large language models, which generally produce unimodal results, the multimodal Chameleon can process any combination of text and images as input and can also process any combination of text and images as output. Meta releases the 7B and 34B variants under a non-commercial license for research purposes only.

Shortly before Chameleon, Meta also demonstrated a new approach to developing better and faster large language models: multi-token prediction. The team was able to show that multi-token prediction improves performance, coherence, and reasoning ability when training AI language models. Meta is releasing the pre-trained models for code completion under a non-commercial license, for research purposes only.

Ad
Ad

Meta releases audio model and watermarking for AI speech

The company also publishes the text-to-music model JASCO. In addition to text, this also accepts various inputs such as chords or beats in order to improve control over the generated music output.

With AudioSeal, Meta is releasing an audio watermarking technology that can recognize and mark AI-generated speech even in longer audio segments. In contrast to other methods, the method is said to be up to 485 times faster. AudioSeal is released under a commercial license.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Meta's Fundamental AI Research (FAIR) team has released new models, including Chameleon, which can process and generate multimodal text and images, a multi-token prediction model, and JASCO, a text-to-music model.
  • Chameleon can process any combination of text and images as input and output. Multi-token prediction is designed to improve the performance, coherence, and reasoning ability of AI language models. In addition to text, JASCO also accepts input such as chords or beats.
  • With AudioSeal, Meta introduces an audio watermarking technology specifically designed for the localized verification of AI-generated speech, which should enable faster and more efficient recognition than conventional methods.
Sources
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.