AI models get better with data unrelated to their actual tasks

Feb 4, 2024

DALL-E 3 prompted by THE DECODER

Researchers are investigating whether multimodality makes AI models more powerful, even when the data is not directly related.

Multimodal AI models, such as Google's Gemini, can process text, images, and sound. Unlike such models, which often use paired data from different modalities, such as images and associated text descriptions, Multimodal Pathway focuses on scenarios where the data sets come from different modalities but have no direct relevance to each other.

Multimodal Pathway Transformer finds positive effect

The team from the Chinese University of Hong Kong and the Tencent AI Lab specifically investigated whether the performance of AI models for one modality, such as image recognition, improves when data from another, actually irrelevant, modality, such as audio or point clouds, is also used.

For this purpose, the researchers developed the Multimodal Pathway Transformer (M2PT), in which a specific tokenizer and a specific head for the target modality are linked to transformer blocks of an auxiliary model trained with data from another modality via "cross-modal re-parameterization".

Cross-modal re-parameterization is a method in which each linear layer in the transformer blocks of the target model is linked to its counterpart in the auxiliary model. The outputs of both layers are added together. This approach incurs little additional training cost and no additional inference cost, making it attractive for practical application.

The application of the multimodal pathway approach led to significant and consistent performance improvements across different modalities. Practical experiments conducted by the developers led to performance improvements in image, point cloud, video, and audio recognition.

AI model benefits from complementary knowledge

Why does it work? The researchers suggest that the model trained on data from one modality has encoded knowledge that can benefit another model's process whose input sequences come from a different modality. This "modality-complementary knowledge" seems to exist and to be transferable, the team says, even if the data between modalities is irrelevant.

Nevertheless, a theoretical foundation for the observed improvements remains open. This could lead to a deeper understanding of the mechanism and neural networks in general and is, according to the team, a topic for future research.

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

AI news without the hype
Curated by humans.

More than 16% discount.
Read without distractions – no Google ads.
Access to comments and community discussions.
Weekly AI newsletter.
6 times a year: “AI Radar” – deep dives on key AI topics.
Up to 25 % off on KI Pro online events.
Access to our full ten-year archive.
Get the latest AI news from The Decoder.

Subscribe to The Decoder

AI models get better with data unrelated to their actual tasks

Multimodal Pathway Transformer finds positive effect

AI model benefits from complementary knowledge

AI News Without the Hype – Curated by Humans

AI news without the hypeCurated by humans.

AI news without the hype
Curated by humans.