Content
summary Summary

Researchers show how Stable Diffusion can read minds. The method reconstructs images from fMRI scans with amazing accuracy.

Researchers have been using AI models to decode information from the human brain for years. At their core, most methods involve using pre-recorded fMRI images as input to a generative AI model for text or images.

In early 2018, for example, a group of researchers from Japan demonstrated how a neural network reconstructed images from fMRI recordings. In 2019, a group reconstructed images from monkey neurons, and Meta's research group, led by Jean-Remi King, has published new work that derives text from fMRI data, for example.

In October 2022, a team at the University of Texas, Austin showed that GPT models can infer text that describes semantic content a person has seen in a video from fMRI scans.

Ad
Ad

In November 2022, researchers at the National University of Singapore, the Chinese University of Hong Kong, and Stanford University used MinD-Vis to show how diffusion models, which power current generative AI models such as Stable Diffusion, DALL-E, and Midjourney, can reconstruct images from fMRI scans with significantly higher accuracy than available approaches at the time.

Stable Diffusion can reconstruct brain images without finetuning

Researchers at the Graduate School of Frontier Biosciences, Osaka University, and CiNet, NICT, Japan, are now using a diffusion model - more specifically, Stable Diffusion - to reconstruct visual experiences from fMRI data.

In doing so, the team eliminates the need to train and fine-tune complex AI models. All that needs to be trained are simple linear models that map the fMRI signals of the lower and upper visual brain regions to individual Stable Diffusion components.

Specifically, the researchers map brain regions as inputs to image and text encoders. The lower brain regions are mapped to the image encoder and the upper brain regions are mapped to the text encoder. This allows the system to use image composition and semantic content for reconstruction, they say.

The researchers use Stable Diffusion to reconstruct a series of images viewed by subjects in fMRI scanners. | Image: Tagaki et al.

The researchers use fMRI images from the Natural Scenes Dataset (NSD) for their experiment and test whether they can use Stable Diffusion to reconstruct what subjects saw.

Recommendation

They show that the combination of image and text decoding provides the most accurate reconstruction. There are differences in accuracy between subjects - but these correlate with the quality of the fMRI images, the team says.

Reconstruction by Stable Diffusion with exclusively low brain regions (z), exclusively higher (c) and in combination. | Image: Tagaki et al.

fMRI reconstruction leads to a better understanding of diffusion models

According to the team, the quality of the reconstructions is on par with the best current methods, but without the need to train the AI models used there.

Conversely, the team also uses models derived from the fMRI data to investigate individual building blocks of Stable Diffusion, such as how semantic content is generated in the inverse diffusion process or what processes occur in the U-Net.

In addition, the team is quantitatively interpreting the image transformations at different stages of diffusion. In this way, the researchers aim to contribute to a better understanding of diffusion models from a biological perspective, which are widely used but still poorly understood.

Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Japanese researchers used Stable Diffusion to reconstruct visual images from MRI data.
  • In this method, the deep and high visual regions are directly mapped to Stable Diffusion's image and text encoders.
  • According to the team, the combination of these two inputs enables an accurate reconstruction that approximates image composition and semantic content.
Sources
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.