Content
summary Summary

Anonymous researchers have presented a specialized AI model called VideoGameBunny in a new paper. VideoGameBunny is a vision-language model that can understand images and answer questions about video games based on screenshots. While this technology could make gaming more accessible, it also has significant potential for abuse in competitive settings.

Ad

The open-source multimodal model is based on the Bunny architecture and was trained on an extensive dataset of over 185,000 screenshots from 413 games collected from YouTube using the search term "gameplay walkthroughs." Bunny was developed by an AI research group at the Beijing Academy of Artificial Intelligence and presented in a paper in February.

Hundreds of thousands of text-image pairs for training

For training, the researchers generated nearly 390,000 image-text pairs using Gemini 1.0 Pro, Gemini 1.5 Pro, GPT-4V, LLaMA-3, and GPT-40, including long and short captions, question-answer sets, and structured JSON descriptions of visual elements.

Image: VideoGameBunny

In a benchmark with multiple-choice questions about video game images, VideoGameBunny achieved an accuracy of 85.1 percent compared to 83.9 percent for the much larger but generally trained open-source model LLaVA-1.6-34b. VideoGameBunny showed particular strengths in recognizing game-specific anomalies and understanding HUD information.

Ad
Ad

When asked whether this game scene shows any glitches or errors, only VideoGameBunny correctly denied this. The unmodified Bunny model, on the other hand, was bothered by the glowing ball in the left half of the screen, while LLaVA claimed that the download bar at the top right was stuck.

Image: VideoGameBunny

VideoGameBunny could help cheaters

To encourage further research, the researchers have made VideoGameBunny's source code, training data, and logs publicly available. In addition to the 8 billion parameter model, there is also an even smaller one with only 4 billion parameters.

Recently, there have been a number of efforts to have AI models play games on their own or assist humans with comments. In May, Microsoft demonstrated the ability of its Copilot to assist inexperienced players in Minecraft.

However, VideoGameBunny seems to take a more holistic approach than previous solutions due to its extensive training material. Instead of specializing in just one game, it could become a general gaming assistant.

The researchers see their model as a first step toward an AI assistant that can perform tasks such as playing, commenting on, and debugging games. However, they are also aware that they could enable cheating: "As AI models becomes more adept at understanding game contents, there is a risk that they could be used to create sophisticated cheating tools." Many such tools already exist, but models like VideoGameBunny could open up new use cases.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Recommendation
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Researchers have developed an AI model called VideoGameBunny that specializes in understanding video games. It is based on the open-source Bunny architecture and was trained on over 185,000 screenshots and 390,000 image-text pairs.
  • In a benchmark with multiple-choice questions about video game images, VideoGameBunny outperformed the larger but generally trained LLaVA model with 85.1 percent accuracy. It performed particularly well in recognizing game-specific anomalies and understanding HUD information.
  • The researchers see potential for AI game assistants but are also aware of the risk of abuse for cheating. To enable further research, they have made the source code, training data, and models publicly available.
Sources
Jonathan works as a technology journalist who focuses primarily on how easily AI can already be used today and how it can support daily life.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.