Content
summary Summary

Stable Diffusion meets Reinforcement Learning - demonstrating how to effectively train generative AI models for images on downstream tasks.

Ad

Diffusion models are now standard in image synthesis and have applications in artificial protein synthesis, where they can aid in drug design. The diffusion process converts random noise into a pattern, such as an image or protein structure.

During training, diffusion models learn to reconstruct content incrementally from training data. Researchers are now trying to intervene in this process using reinforcement learning to fine-tune generative AI models to achieve specific goals, such as improving the aesthetic quality of images. This is inspired by the fine-tuning of large language models, like OpenAI's ChatGPT.

Reinforcement learning for more aesthetic images?

A new paper from Berkeley Scientific Intelligence Research examines the effectiveness of reinforcement learning using Denoising Diffusion Policy Optimization (DDPO) for fine-tuning to different goals.

Ad
Ad

The team trains Stable Diffusion on four tasks:

  • Compressibility: How easy is the image to compress using the JPEG algorithm? The reward is the negative file size of the image (in kB) when saved as a JPEG.
  • Incompressibility: How hard is the image to compress using the JPEG algorithm? The reward is the positive file size of the image (in kB) when saved as a JPEG.
  • Aesthetic Quality: How aesthetically appealing is the image to the human eye? The reward is the output of the LAION aesthetic predictor, which is a neural network trained on human preferences.
  • Prompt-Image Alignment: How well does the image represent what was asked for in the prompt? This one is a bit more complicated: we feed the image into LLaVA, ask it to describe the image, and then compute the similarity between that description and the original prompt using BERTScore.
LLaVA helps bring the prompt and image closer together. | Image: BAIR

In their tests, the team showed that DDPO can be used effectively to optimize the four tasks. In addition, they showed some generalizability: the optimizations for aesthetic quality or prompt image alignment, for example, were performed for 45 common animal species, but were also transferable to other animal species or the representation of inanimate objects.

Video: BAIR

New method does not require training data

As is common in reinforcement learning, DDPO also exhibits the phenomenon of reward overoptimization: the model destroys all meaningful image content in all tasks after a certain point in order to maximize reward. This problem needs to be investigated in further work.

Image: BAIR

Still, the method is promising: "What we’ve found is a way to effectively train diffusion models in a way that goes beyond pattern-matching — and without necessarily requiring any training data. The possibilities are limited only by the quality and creativity of your reward function."

Recommendation

More information and examples are available on the BAIR project page on DDPO.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Researchers at Berkeley Artificial Intelligence Research (BAIR) are using reinforcement learning to further optimize generative AI models for images.
  • In tests, Denoising Diffusion Policy Optimization (DDPO) has been shown to be effective in optimizing (in)compressibility, aesthetic quality, and prompt image alignment.
  • The method requires no training data and opens up new possibilities for AI-based image synthesis, but needs to be further explored.
Sources
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.