AI in practice

Feb 4, 2024Feb 4, 2024

Adept's multimodal Fuyu-Heavy model is adept at understanding UIs and inferring actions to take

Matthias is the co-founder and publisher of THE DECODER, exploring how AI is fundamentally changing the relationship between humans and computers.

Profile

E-Mail

Adept recently introduced Fuyu-Heavy, a new multimodal AI model for digital agents. Fuyu-Heavy is the third most capable multimodal model after GPT-4V and Gemini Ultra, and excels in multimodal reasoning and UI understanding, the company says. It performs well on traditional multimodal benchmarks and matches or exceeds the performance of models in the same performance class on standard text-based benchmarks. The model performs similarly to Claude 2.0 on chat scores, and slightly better than Gemini Pro on the MMMU benchmark. Fuyu-Heavy will soon power Adept's enterprise product, and lessons learned from its development have already been applied to its successor. The following video demonstrates the model's ability to understand a user interface.

Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:

Bank transfer

Sources

Adept

Matthias Bastian

Matthias is the co-founder and publisher of THE DECODER, exploring how AI is fundamentally changing the relationship between humans and computers.

Profile

E-Mail

AI in practice

Feb 9, 2025Feb 9, 2025

Pika Labs launches new "Pikadditions" video inpainting feature

News, tests and reports about VR, AR and MIXED Reality.

What happens next with MIXED My personal farewell to MIXED Meta and Anduril are now jointly developing XR headsets for the US military MIXED-NEWS.com

AI in practice

Nov 22, 2024Nov 22, 2024

Black Forest Labs expands FLUX.1 with four new AI tools for image editing

AI in practice

Nov 6, 2024Nov 6, 2024

Update

Flux 1.1 Pro AI image model adds "amateur" RAW photo mode and 4K image generation

Google News

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Adept's multimodal Fuyu-Heavy model is adept at understanding UIs and inferring actions to take

Pika Labs launches new "Pikadditions" video inpainting feature

Black Forest Labs expands FLUX.1 with four new AI tools for image editing

Flux 1.1 Pro AI image model adds "amateur" RAW photo mode and 4K image generation

Google upgrades Gemini with Deep Think and flags early warning risks

OpenAI’s math breakthrough might also mean AI is getting better at knowing its own limits

Google DeepMind's Gemini wins Mathematical Olympiad gold using only natural language

Adept's multimodal Fuyu-Heavy model is adept at understanding UIs and inferring actions to take

Pika Labs launches new "Pikadditions" video inpainting feature

Black Forest Labs expands FLUX.1 with four new AI tools for image editing

Flux 1.1 Pro AI image model adds "amateur" RAW photo mode and 4K image generation