AI models get better with data unrelated to their actual tasks

DALL-E 3 prompted by THE DECODER

Researchers are investigating whether multimodality makes AI models more powerful, even when the data is not directly related.

Multimodal AI models, such as Google's Gemini, can process text, images, and sound. Unlike such models, which often use paired data from different modalities, such as images and associated text descriptions, Multimodal Pathway focuses on scenarios where the data sets come from different modalities but have no direct relevance to each other.

Multimodal Pathway Transformer finds positive effect

The team from the Chinese University of Hong Kong and the Tencent AI Lab specifically investigated whether the performance of AI models for one modality, such as image recognition, improves when data from another, actually irrelevant, modality, such as audio or point clouds, is also used.

For this purpose, the researchers developed the Multimodal Pathway Transformer (M2PT), in which a specific tokenizer and a specific head for the target modality are linked to transformer blocks of an auxiliary model trained with data from another modality via "cross-modal re-parameterization".

Cross-modal re-parameterization is a method in which each linear layer in the transformer blocks of the target model is linked to its counterpart in the auxiliary model. The outputs of both layers are added together. This approach incurs little additional training cost and no additional inference cost, making it attractive for practical application.

The application of the multimodal pathway approach led to significant and consistent performance improvements across different modalities. Practical experiments conducted by the developers led to performance improvements in image, point cloud, video, and audio recognition.

AI model benefits from complementary knowledge

Why does it work? The researchers suggest that the model trained on data from one modality has encoded knowledge that can benefit another model's process whose input sequences come from a different modality. This "modality-complementary knowledge" seems to exist and to be transferable, the team says, even if the data between modalities is irrelevant.

Nevertheless, a theoretical foundation for the observed improvements remains open. This could lead to a deeper understanding of the mechanism and neural networks in general and is, according to the team, a topic for future research.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

AI models get better with data unrelated to their actual tasks

Multimodal Pathway Transformer finds positive effect

AI model benefits from complementary knowledge

Trump advisors are pushing a regulation targeting what they call "woke" AI models in the tech sector

Anthropic appears to tighten the usage limits for Claude code

OpenAI launches new ChatGPT agent that automates complex tasks for Pro, Plus, and Team

OpenAI launches new ChatGPT agent that automates complex tasks for Pro, Plus, and Team

Kimi-K2 is the next open-weight AI milestone from China after Deepseek

New Energy-Based Transformer architecture aims to bring better "System 2 thinking" to AI models

AI models get better with data unrelated to their actual tasks

Multimodal Pathway Transformer finds positive effect

AI model benefits from complementary knowledge

Trump advisors are pushing a regulation targeting what they call "woke" AI models in the tech sector

Anthropic appears to tighten the usage limits for Claude code

OpenAI launches new ChatGPT agent that automates complex tasks for Pro, Plus, and Team