AI system MinD-Vis decodes images from MRI scans

A new AI system reconstructs images from MRI data two-thirds more accurately than older systems. This is made possible by more data and diffusion models.

Can AI models decode thoughts? Experiments with large language models, such as those by a Meta research group led by Jean-Remi King, attempt to decode words or sentences from MRI data using language models.

Recently, a research group demonstrated an AI system that decodes MRI data from a person watching a video into text describing some of the visible events.

These technologies could one day lead to advanced interfaces that, for example, allow people with disabilities to better communicate with their surroundings or control a computer.

A new study now relies on diffusion models to reconstruct images from human MRI data. Diffusion models are also available in advanced image AI systems such as DALL-E 2 or Stable Diffusion. They can reconstruct images from noise.

MinD-Vis relies on diffusion and 340 hours of MRI scans

Researchers from the National University of Singapore, the Chinese University of Hong Kong, and Stanford University demonstrate "Sparse Masked Brain Modeling with Double-Conditioned Latent Diffusion Model for Human Vision Decoding" - MinD-Vis for short.

The work aims to create a diffusion-based AI model that can decode visual stimuli from brain data, laying a foundation for linking human and machine vision.

MinD-Vis learns to reconstruct images from MRI scans. | Image: Chen et al.

First, the AI system learns an effective representation of MRI data via self-supervised learning. Embeddings of this data then serve as a condition for image generation of the diffusion model.

For training, the team relies on data from the Human Connectome Project and the Generic Object Decoding Dataset. In total, the training data reaches 136,000 MRI segments from 340 hours of MRI scans, the largest dataset to date for a brain decoding AI system.

Recommendation

AI research

Researchers build massive AI training dataset using only openly licensed sources

MinD-Vis captures semantic details and image features

While the first dataset consists entirely of MRI data, the second includes 1,250 different images from 200 classes. The team withheld 50 of the images for testing.

For further validation of their approach, the researchers also relied on the Brain, Object, Landscape Dataset, which includes 5,254 MRI-image pairs.

According to the publication, MinD-Vis significantly outperforms older models: the system is 66 percent better at capturing semantic content and 41 percent better at the quality of generated images.

In the end, however, this still leaves the system far from being able to reliably read thoughts: Despite the improvement, the accuracy in capturing semantic content is 23.9 percent.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

MinD-Vis' image quality and semantic accuracy are significantly better than previous systems, but overall remain low. | Image: Chen et al.

Moreover, the quality of the reconstructed images varied between different subjects. A well-known phenomenon in the research field, the team writes. However, some of the image classes tested were not included in the training dataset. More data could therefore further improve the quality of the system.

More information and examples are available on the MinD-Vis project page.

AI system MinD-Vis decodes images from MRI scans

MinD-Vis relies on diffusion and 340 hours of MRI scans

Researchers build massive AI training dataset using only openly licensed sources

MinD-Vis captures semantic details and image features

New Energy-Based Transformer architecture aims to bring better "System 2 thinking" to AI models

AI coding can make developers slower even if they feel faster

Google unveils MedGemma, an open-source AI model suite for medical applications

AI coding can make developers slower even if they feel faster

Musk unveils Grok 4 as xAI’s new AI model that beats OpenAI and Google on major benchmarks

"Cat attack" on reasoning model shows how important context engineering is

AI system MinD-Vis decodes images from MRI scans

MinD-Vis relies on diffusion and 340 hours of MRI scans

MinD-Vis captures semantic details and image features

Share

Bank details