Content
summary Summary

ChatGPT got eyes through GPT-4V and can now analyze graphics, photos and all other kinds of visual content. This opens up new possibilities.

Ad

About half a year after announcing the multimodal version of GPT-4, now called GPT-4V(ision), OpenAI is finally rolling out this feature to paying customers of ChatGPT. In a short time, users have been exploring the possibilities of what can be done with ChatGPT using combined image and text prompts. Here are the most interesting examples we have found so far.

Front-end development

One of the most popular use cases for large language models is programming. This is where ChatGPT's new multimodal capabilities come into play. For example, GPT-4V can reconstruct a website dashboard from screenshots or drawings.

Video: @mckaywrigley/Twitter

Ad
Ad

The AI tool seems to mimic the given structure almost perfectly, even taking the data from the template 1:1 as placeholders. Only design details like colors, shapes, and fonts are different. Nevertheless, GPT-4V might significantly shorten the development time from design to prototype in this area.

Video: @mckaywrigley/Twitter

Matt Shumer, CEO of AI startup HyperWrite, went a step further and combined GPT-4V with the concept of AutoGPT. Here, the AI uses the result of one run as a prompt for the next run, allowing it to continually improve the code on its own.

Video: @mattshumer_/Twitter

Explain comics, memes, or infographics

GPT-4V can explain what's shown and what's meant in an image. Whether it's a cartoon, a comic, or a Twitter meme, ChatGPT will first describe the image in detail, including captions, and then explain, for example, in the case of humorous images, why the content might be understood as funny.

Recommendation
Twitter: @skirano/Twitter
Bild: @rcweston/Twitter

Helping with homework

AI plays an important role in education, if only because learners use ChatGPT and the like as tools. Multimodality can have a big impact on the usefulness of large language models in this context.

As Mckay Wrigley demonstrates on Twitter, GPT-4V can easily parse even complex infographics like this labeling of a human cell. In his example, he then has the cell explained at a ninth-grade level and can follow up with more in-depth questions.

Video: @mckaywrigley/Twitter

The model can also break down infographics into simple text explanations, as Muratcan Koylan shows.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Image: Muratcan Koylan / @youraimarketer

Give advice on interior design or visualize the effects of psychedelic mushrooms

In addition to graphics, GPT-4V can also process photos or AI-generated images. In this example, Twitter user Pietro Schirano fed in AI images and asked ChatGPT for a name for this interior style, which the AI promptly provided with a reasonable explanation.

Image: @skirano/Twitter

Schirano underscores the potential of personalized AI with another example. He has stored some information about himself in the custom instructions, which ChatGPT takes into account when evaluating his interior.

Image: @skirano/Twitter

The following example shows an interesting combination of image example, image description, and resulting image generation: Twitter user Pietro Schirano first shows GPT-4V a photo of psychedelic mushrooms. Without telling the system which mushrooms they are, he asks GPT-4V how it would feel after eating them.

Image: @skirano/Twitter

The system provides an appropriate description and can then define the hallucinations as needed in an image prompt that generates the following output via DALL-E 2.

Image: Pietro Schirano / @skirano

With the launch of DALL-E 3 in October for all paying users in ChatGPT, this workflow should be possible directly in ChatGPT Plus, and with much higher quality.

Decipher illegible writing

OCR (Optical Character Recognition) is only a small part of GPT-4V, but its capabilities are immense. Historians might be interested to know that GPT-4V can decipher and translate historical manuscripts. "The humanities are about to change in a major way," says researcher Ethan Mollick after using GPT-4V to convert, translate, and analyze Robert Hooke's century-old notes.

Ad
Ad
Image: Ethan Mollick
Image: Ethan Mollick

How to use GPT-4V?

GPT-4V requires a paid membership to ChatGPT-Plus for $20 per month. Once you got that, you can upload images via the website and the smartphone app. The app allows you to upload multiple images at once and highlight specific areas of the image. OpenAI is currently rolling out GPT-4V in phases. So even if you have a paid membership, you may not have access to it yet.

Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • OpenAI's ChatGPT now includes GPT-4V(ision), which analyzes visual content such as graphics and photos, allowing users to combine image and text prompts.
  • GPT-4V can be used for tasks such as front-end web development, explaining visuals such as comics and infographics, providing interior design advice, and deciphering illegible writing in historical manuscripts, early examples show.
  • Access to GPT-4V requires a paid membership to ChatGPT-Plus, priced at $20 per month, and is currently being rolled out in phases to users who can upload images via the website and smartphone app.
Jonathan works as a freelance tech journalist for THE DECODER, focusing on AI tools and how GenAI can be used in everyday work.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.