Content
summary Summary

Researchers at Meta have developed new photorealistic synthetic datasets using Unreal Engine that enable more controlled and robust evaluation and training of AI vision systems.

Meta researchers have introduced a family of synthetic image datasets called PUG (Photorealistic Unreal Graphics) that aim to provide new capabilities for the evaluation and training of AI vision systems. They use the Unreal Engine, a state-of-the-art real-time 3D graphics engine, to render photorealistic image data.

While synthetic datasets have been created before, the researchers say they often lacked realism, limiting their usefulness. By leveraging the photorealism of Unreal Engine, the PUG datasets aim to bridge the gap between synthetic and real-world data.

The researchers present four PUG datasets:

Ad
Ad
  • PUG: Animals contains over 200,000 images of animals in various poses, sizes, and environments. It can be used to study out-of-distribution robustness and model representations.
  • PUG: ImageNet with over 90,000 images as an additional robustness test set to ImageNet, containing a rich set of factor changes such as pose, background, size, texture, and lighting.
  • PUG: SPAR with over 40,000 images is used to evaluate vision language models on scene, position, attribute, and relation understanding.
  • PUG: AR4T provides approximately 250,000 images for fine-tuning vision-language models for spatial relations and attributes.

PUG already showed weak robustness in leading ImageNet models

In addition to the datasets, researchers can use the PUG environment to create their own data, precisely specifying factors such as lighting and viewpoint that are difficult to control with real-world datasets. The ability to generate data that covers a range of domains enables more reliable evaluation and training of vision language models compared to existing benchmarks, the team writes.

Video: Meta

In experiments, the researchers demonstrate PUG's ability to benchmark model robustness and representation quality: PUG showed that the top models in ImageNet were not necessarily the most robust to factors such as pose and lighting. It also allowed the study of how different vision-language models capture relationships between images and text.

More information and data are available on the PUG project website.

Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Researchers at Meta have created a family of synthetic image datasets called PUG (Photorealistic Unreal Graphics) using Unreal Engine, aimed at improving AI vision systems' evaluation and training.
  • The PUG datasets offer photorealistic images that bridge the gap between synthetic and real-world data, including over 200,000 animal images, 90,000 ImageNet images, and others for vision-language models.
  • The PUG environment allows for the creation of user-specific datasets, offering control over factors like lighting and viewpoint, and enabling more reliable evaluation and training of AI vision systems compared to existing benchmarks.
Sources
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.