Content
summary Summary

OpenAI has made its multimodal image generation model, GPT-Image-1, available to developers through the API. Previously limited to ChatGPT, the model is now being adopted by companies including Adobe and Figma.

Ad

According to OpenAI, the model generated over 700 million images for well over 130 million users during its first week in ChatGPT. With wider availability through the API, that number is likely to grow.

Ghibli hype aside, OpenAI's model is known for its highly accurate prompt tracking, which is much more precise than other available image models. In direct comparison, the new Midjourney-v7 has shown significantly weaker performance.

AI images cost only pennies

Image processing with gpt-image-1 is billed by tokens. The API pricing structure distinguishes between text tokens, image input tokens, and image output tokens. Text tokens are priced at $5 per million, image input tokens at $10 per million, and image output tokens at $40 per million. Depending on the selected image quality, costs typically range from $0.02 to $0.19 per image.

Ad
Ad

For GPT-4.1 and GPT-4o, token usage depends on both image size and the chosen detail level. A flat rate of 85 tokens is charged for "detail: low." For "detail: high," the image is divided into 512-pixel tiles, each adding 170 tokens to the base rate. For example, a 1024×1024 image with high detail requires 765 tokens (four tiles plus 85 tokens).

Other models, such as GPT-4.1-mini, use a calculation based on 32×32 pixel patches, with a maximum of 1,536 image tokens. Larger images, such as 1800×2400 pixels, are scaled before processing to fit within the token limit.

Quality Square (1024×1024) Portrait (1024×1536) Landscape (1536×1024)
Low 272 tokens 408 tokens 400 tokens
Medium 1056 tokens 1584 tokens 1568 tokens
High 4160 tokens 6240 tokens 6208 tokens

Images can be provided via direct URLs or as Base64-encoded data. The API accepts PNG, JPEG, WEBP, and non-animated GIF formats up to 20 MB. At high detail, images are scaled to a maximum resolution of 768×2000 pixels.

The model can interpret visual content such as objects, colors, shapes, and embedded text. However, there are limitations with small text, non-Latin fonts, rotated images, or complex diagrams, according to OpenAI. The technology is not suitable for medical images, CAPTCHAs, or tasks that require high spatial precision. Interpretations are generally approximate, for example when counting objects or identifying positions. Images containing watermarks, text, or NSFW content are not accepted. The "detail" parameter controls the level of analysis, with options for "low," "high," or "auto."

In addition to image generation via the Images API, the model can also analyze images. The Chat Completions API and Responses API can process images as input and generate textual output. Support for image generation via the Responses API is planned.

Recommendation

Organizations may be required to complete verification to activate the model. Details on access management are available in the organization settings. Developers can test the model using the Playground or consult the official Image Generation Guide.

The model uses the same safety mechanisms as ChatGPT-4o's image generation, including content filters and C2PA metadata for origin verification. Filter strength is adjustable via the "moderation" parameter. OpenAI states that no customer data from the API is used for training. All usage is subject to OpenAI's API usage guidelines.

Early adoption by commercial platforms

According to OpenAI, companies such as Adobe (Creative Cloud), Figma (design platform), Airtable (workflow automation), Wix (website design), and Photoroom (e-commerce visuals) are already using the API in production. Adobe is incorporating image generation into its Firefly and Express applications to expand creative style options.

Other companies, including Gamma, HeyGen, OpusClip, and Quora, use the model for applications such as presentation graphics, avatar creation, YouTube thumbnails, and as a general image generator. Instacart is experimenting with recipe images, and Invideo is testing the technology for video editing.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • OpenAI has released its multimodal image generation model "gpt-image-1"—previously exclusive to ChatGPT—through an API, expanding developer access.
  • Pricing varies by usage: text, image input, and image output tokens are billed separately, with image generation costing between $0.02 and $0.19 depending on quality.
  • The API supports not only creating images but also analyzing and processing them, with options for adjusting safety filters and moderation; OpenAI states that data from API customers will not be used for training.
Sources
Matthias is the co-founder and publisher of THE DECODER, exploring how AI is fundamentally changing the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.