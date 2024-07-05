AI research
Maximilian Schreiner

Google DeepMind's JEST speeds up AI training by 13x while slashing computing needs

Midjourney prompted by THE DECODER
Google DeepMind's JEST speeds up AI training by 13x while slashing computing needs
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Profile
E-Mail
Content
summary Summary

Researchers from Google DeepMind have developed a method called JEST that makes training AI models for image and text processing significantly more efficient.

Ad

Multimodal AI models learn to link images and texts by maximizing the correspondence of related image-text pairs and minimizing the correspondence of unrelated pairs. Traditionally, training examples are randomly selected or based on individual relevance for each iteration in batches.

However, the researchers argue that the quality of a batch depends not only on the sum of the individual data points but also on their composition. Therefore, they have developed an algorithm that selects subsets of data from a larger "super batch" based on their collective learnability.

JEST uses AI model for data selection

To determine which data is most learnable, JEST (Joint Example Selection Technique) uses two AI models: the model currently being trained and an already trained reference model. Data that is difficult for the model being trained but easy for the reference model is considered particularly useful.

Ad
Ad

With this method, the team was able to shorten the training time for certain tasks by a factor of 13. At the same time, ten times less computing power was needed to achieve the same performance as with conventional methods.

According to the researchers, the choice of the reference model, which is pre-trained on a small, high-quality dataset, is crucial. Its quality limits the potential improvements. By increasing the reference dataset from 100 to 600 million examples while maintaining high quality, the results could be further improved.

Flexi-JEST achieves top score with 10 percent of training data

To reduce the increased computational effort when evaluating the "super batch," the scientists also introduced a variant called Flexi-JEST. This uses a simplified version of the model with coarser image resolution to evaluate the data and trains in parallel with full and reduced resolution.

With Flexi-JEST, a model achieved better average performance on eight standard tasks after 4 billion training examples than the currently best model SigLIP after 40 billion examples. This corresponds to a saving of 90 percent of the computing operations.

According to the researchers, the results show the potential to learn from small, carefully curated datasets to filter much larger, unstructured amounts of data - a process they call "data quality bootstrapping." This could pave the way for more efficient AI models that require less computing power and training data.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Recommendation
AI research

Inconsistent and illogical: Study uncovers the erratic reasoning of AI language models

Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Google Deepmind researchers have developed a method called JEST that makes training multimodal AI models for image and text processing more efficient by selecting subsets of data according to their joint learning ability.
  • JEST uses two AI models - the model to be trained and a pre-trained reference model - to find out which data is particularly instructive. This reduces the training time by a factor of 13 and the required computing power by 90%.
  • The Flexi-JEST variant uses a simplified version of the model for data evaluation, and achieves better performance than the current leading model with only 10% of the training data. The researchers see the potential for learning from small, carefully curated data sets to filter large, unstructured amounts of data.
Sources
Arxiv
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Profile
E-Mail
AI research

French AI lab Kyutai unveils conversational AI assistant Moshi, plans open-source release

News, tests and reports about VR, AR and MIXED Reality.
The team behind Resident Evil 4 VR isn't done with VR yet An arcade classic makes a mixed reality comeback on Quest "Does it Stack?" is coming to Quest with local multiplayer this fall MIXED-NEWS.com
AI research

Whiteboard of Thought: New method allows GPT-4o to reason with images

AI research

Google's ImageInWords could boost everything from image search to text-to-image AI

Google News
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Google DeepMind's JEST speeds up AI training by 13x while slashing computing needs

Bank details

IBAN: DE87 1203 0000 1086 0070 75
Account holder: DEEP CONTENT GbR
Purpose: Support THE DECODER
AI research

French AI lab Kyutai unveils conversational AI assistant Moshi, plans open-source release

AI research

Tencent researchers unleash an army of AI-generated personas for data generation

AI research

Meta's new AI can create 3D objects from text in under a minute

Google News