CosmicMan is a new AI image model optimized for generating images of people

Researchers at the Shanghai AI Laboratory have developed a specialized text-to-image model for photorealistic generation of human images. Thanks to a massive dataset and a new training method, CosmicMan achieves impressive results.

Scientists at the Shanghai AI Laboratory present CosmicMan, a novel text-to-image foundation model that specializes in generating high-quality images of people.

Unlike current image foundation models, which often struggle to generate detailed human images that match the text description, CosmicMan enables photorealistic results with precise text-image alignment. Users can even specify small details in their prompt, such as an alternative color for a hat.

Data production as a feedback loop between humans and AI

According to the researchers, led by Shikai Li and Jianglin Fu, CosmicMan's success rests on two pillars: a huge, high-quality dataset and a novel framework for training the AI model.

For CosmicMan, the scientists developed a new approach to generating training data that they call "Annotate Anyone." It works as a kind of feedback loop between humans and the AI, and aims to provide high-quality, always-up-to-date data at low cost. In this approach, the AI first generates detailed labels, which are then reviewed and optimized by humans.

The researchers built a massive dataset of human images cooperatively labeled by AI and humans. | Image: Li et al.

Using this method, the team created the "CosmicMan-HQ 1.0" dataset, which contains six million images of humans at an average resolution of 1488 x 1255 pixels. The images are annotated with precise text descriptions derived from 115 million attributes of varying levels of detail.

Focusing on the human

A second element is the so-called "Decomposed-Attention-Refocusing" framework (Daring), which, simply put, categorizes the words in a prompt into categories corresponding to the human body, such as "head," "arms," "legs," and so on.

This allows the AI model to focus on drawing each part of the person individually, rather than trying to draw everything at once. This leads to better and more easily customizable results.

The human-specialized AI model can focus on individual body parts. This offers higher accuracy and flexibility. | Image: Li et al.

In various experiments, CosmicMan shows promising results, outperforming current state-of-the-art models in both quantitative metrics and perceived visual quality, the researchers say. They see great potential for using CosmicMan in various applications, such as the entertainment industry, e-commerce, or avatar creation for virtual worlds.

Recommendation

AI in practice

Is OpenAI's brain drain a sign of AI winter or just bad management?

The CosmicMan-HQ 1.0 dataset will be released soon. The team, led by Shikai Li and Jianglin Fu, will continue to improve the model and is already planning the next version.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

CosmicMan is a new AI image model optimized for generating images of people

Data production as a feedback loop between humans and AI

Focusing on the human

Is OpenAI's brain drain a sign of AI winter or just bad management?

Google’s gemini-embedding-001 text embedding model is now broadly available

Elon Musk's SpaceX is investing $2 billion in Elon Musk’s AI lab, xAI

xAI says it wants to fix Grok 4 because referencing Musk's views is not right for a truth-seeking AI

Kimi-K2 is the next open-weight AI milestone from China after Deepseek

New Energy-Based Transformer architecture aims to bring better "System 2 thinking" to AI models

Musk unveils Grok 4 as xAI’s new AI model that beats OpenAI and Google on major benchmarks

CosmicMan is a new AI image model optimized for generating images of people

Data production as a feedback loop between humans and AI

Focusing on the human

Share

Bank details