Meta shows generative AI model with database access

Generative AI models like DALL-E 2 or Stable Diffusion store their knowledge in their parameters. Meta shows a model that can access an external database instead.

Current generative AI models like DALL-E 2 generate impressive images, but store their knowledge in the model parameters. Improvements require ever larger models and ever more training data.

Researchers are therefore working on several ways to teach models new concepts or provide direct access to external knowledge.

Ideas for such retrieval-augmented methods come from another field of generative AI models: natural language processing. OpenAI, Google, or Meta have already demonstrated WebGPT and other language models that can for example access the Internet to check their answers.

Google and Meta rely on queries for multimodal models

In October, Google researchers demonstrated Re-Imagen (Retrieval-Augmented Text-to-Image Generator), a generative model that uses an external multimodal knowledge base to generate images.

Re-Imagen uses the database to retrieve semantic and visual information about unknown or rare objects, and thus improves the accuracy of image generation.

Google Re-Imagen accesses an external database to generate higher-quality images of rare or unseen objects. | Image: Google

In a new paper, researchers from Meta, Stanford University, and the University of Washington now demonstrate the generative model RA-CM3 (Retrieval Augmented CM3), which also uses external data. CM3 stands for "Causal Masked Multimodal Model" and is a transformer model introduced by Meta in early 2022 that can generate images and text.

Meta's RA-CM3 relies on an external database, making it much smaller

Meta's RA-CM3 was trained using part of the LAION dataset, which was also used for Stable Diffusion. Unlike Re-Imagen, RA-CM3 can process text as well as images. Text prompts and images can serve as input.

The input is processed by a multimodal encoder and passed to a retriever, which retrieves relevant multimodal data from external memory, which is then also processed by a multimodal encoder.

Recommendation

AI research

Study shows: 'Test-time compute scaling' is a path to better AI systems

A retriever queries information from external storage. | Image: Yasunaga et al.

Both data streams are then passed to the multimodal generator, which then generates text or images. RA-CM3 can use external image and text data to generate images that are more accurate or to generate image captions. The database also allows the model to retrieve images of a particular mountain, building, or monument and use it to generate an image containing the object.

RA-CM3 can query external images and use them for better image synthesis. | Image: Yasunaga et al.

Using the external information, the model can also better complete images. RA-CM3 also exhibits one-shot and few-shot image classification abilities, the researchers write.

RA-CM3 can also use multiple external images as input for image synthesis. | Image: Yasunaga et al.

Overall, RA-CM3 uses significantly less training data and resources and is also much smaller than comparable models, the team writes. For the largest model, the researchers used 150 million images and three billion parameters. However, the average quality of the generated images is still below that of the much larger models from OpenAI or Google.

However, the effect of scaling is also evident in RA-CM3: The largest model is ahead of the smaller variants, and the team assumes that larger models will be significantly better.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Meta shows generative AI model with database access

Google and Meta rely on queries for multimodal models

Meta's RA-CM3 relies on an external database, making it much smaller

Study shows: 'Test-time compute scaling' is a path to better AI systems

Google DeepMind open-sources AI text watermarking for Gemini

Microsoft's RUBICON tells if your AI coding buddy is actually helping or just slacking off

Language models like GPT-4 memorize more than they reason, study finds

OpenAI launches new ChatGPT agent that automates complex tasks for Pro, Plus, and Team

Kimi-K2 is the next open-weight AI milestone from China after Deepseek

New Energy-Based Transformer architecture aims to bring better "System 2 thinking" to AI models

Meta shows generative AI model with database access

Google and Meta rely on queries for multimodal models

Meta's RA-CM3 relies on an external database, making it much smaller

Share

Bank details