MiniGPT-4 is another example of open-source AI on the rise

OpenAI introduced GPT-4 as a multimodal model with image understanding, but has not yet released the image part of the model. MiniGPT-4 makes it available today - as an open-source model.

MiniGPT-4 is a chatbot with image understanding. This is a feature that OpenAI introduced at the launch of GPT-4, but has not yet been released outside the Be my Eyes app.

Like its larger counterpart, MiniGPT-4 can describe images or answer questions about the content of an image: for example, given a picture of a prepared dish, the model can output a (possibly) matching recipe (see featured image) or generate an appropriate image description for visually impaired people. Similar to Midjourney's new "/describe" feature, MiniGPT-4 could extract prompts from images, or at least some ideas. OpenAI's much-touted image-to-website feature, introduced at the GPT-4 launch, can also be done with MiniGPT-4, according to the researchers.

MiniGPT-4 generates matching HTML code based on a hand-drawn web page sketch. | Image: Zhu, Chen et al.

"Our findings reveal that MiniGPT-4 processes many capabilities similar to those exhibited by GPT-4 like detailed image description generation and website creation from hand-written drafts," the paper states.

The development team makes the code, demos, and training instructions for MiniGPT-4 available on Github. They also announce a smaller version of the model that will run on a single Nvidia 3090 graphics card. The demo video below shows some examples.

Open-source AI is on the rise

The remarkable thing about MiniGPT-4 is that it is based on the Vicuna-13B LLM and the BLIP-2 Vision Language Model, open-source software that can be trained and fine-tuned for comparatively little money and without massive data and computational overhead.

The research team first trained MiniGPT-4 with about five million image-text pairs in ten hours on four Nvidia A100 cards. In a second step, the model was refined with 3,500 high-quality text-image pairs generated by an interaction between MiniGPT-4 and ChatGPT. ChatGPT corrected the incorrect or inaccurate image descriptions generated by MiniGPT-4.

Fix the error in the given paragraph. Remove any repeating sentences, meaningless characters, not English sentences, and so on. Remove unnecessary repetition. Rewrite any incomplete sentences. Return directly the results without explanation. Return directly the input paragraph if it is already correct without explanation.

ChatGPT prompt for MiniGPT-4

This second step significantly improved the reliability and usability of the model - and required only seven minutes of training on a single Nvidia A100. The researchers themselves said they were surprised by the efficiency of their approach.

MiniGPT-4 Vicuna's language model follows the "Alpaca formula" and uses ChatGPT's output to fine-tune a Meta language model of the LLaMA family. Vicuna is said to be on par with Google Bard and ChatGPT, again with a relatively small training effort.

Recommendation

AI in practice

Meta takes on OpenAI's GPT-4o with Llama 3 405B, its largest open-source LLM to date

MiniGPT-4 is another example of the rapid progress the open source community has made in a very short time. It suggests that the moat for pure AI model companies may not be that high: just yesterday, the open-source chatbot OpenAssistant was launched, trained with instructional data collected from volunteers and intended to become an open ChatGPT alternative eventually.

Given this development, it would make sense for OpenAI to first focus on building a partner ecosystem using ChatGPT plugins for GPT-4, rather than training GPT-5 now. The research and training effort for a new model may be greater for OpenAI than the head start it might gain over competitors or the open-source community. In comparison, building a chat ecosystem is more challenging and economically unsustainable. It can also have a strong lock-in effect on users.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

MiniGPT-4 is another example of open-source AI on the rise

Open-source AI is on the rise

Meta takes on OpenAI's GPT-4o with Llama 3 405B, its largest open-source LLM to date

OpenAI and the American Federation of Teachers plan to train 400,000 U.S. teachers in AI

Salesforce aims to control data flow as companies move toward agent-driven enterprise software

OpenAI is ramping up security to prevent rivals from copying its advanced AI models

"Cat attack" on reasoning model shows how important context engineering is

Apple's claims about large reasoning models face fresh scrutiny from a new study

Cloudflare CEO Matthew Prince sees trouble ahead for the open web

MiniGPT-4 is another example of open-source AI on the rise

Open-source AI is on the rise

Share

Bank details