Anonymous researchers have presented a specialized AI model called VideoGameBunny in a new paper. VideoGameBunny is a vision-language model that can understand images and answer questions about video games based on screenshots. While this technology could make gaming more accessible, it also has significant potential for abuse in competitive settings.
The open-source multimodal model is based on the Bunny architecture and was trained on an extensive dataset of over 185,000 screenshots from 413 games collected from YouTube using the search term "gameplay walkthroughs." Bunny was developed by an AI research group at the Beijing Academy of Artificial Intelligence and presented in a paper in February.
Hundreds of thousands of text-image pairs for training
For training, the researchers generated nearly 390,000 image-text pairs using Gemini 1.0 Pro, Gemini 1.5 Pro, GPT-4V, LLaMA-3, and GPT-40, including long and short captions, question-answer sets, and structured JSON descriptions of visual elements.
In a benchmark with multiple-choice questions about video game images, VideoGameBunny achieved an accuracy of 85.1 percent compared to 83.9 percent for the much larger but generally trained open-source model LLaVA-1.6-34b. VideoGameBunny showed particular strengths in recognizing game-specific anomalies and understanding HUD information.
When asked whether this game scene shows any glitches or errors, only VideoGameBunny correctly denied this. The unmodified Bunny model, on the other hand, was bothered by the glowing ball in the left half of the screen, while LLaVA claimed that the download bar at the top right was stuck.
VideoGameBunny could help cheaters
To encourage further research, the researchers have made VideoGameBunny's source code, training data, and logs publicly available. In addition to the 8 billion parameter model, there is also an even smaller one with only 4 billion parameters.
Recently, there have been a number of efforts to have AI models play games on their own or assist humans with comments. In May, Microsoft demonstrated the ability of its Copilot to assist inexperienced players in Minecraft.
However, VideoGameBunny seems to take a more holistic approach than previous solutions due to its extensive training material. Instead of specializing in just one game, it could become a general gaming assistant.
The researchers see their model as a first step toward an AI assistant that can perform tasks such as playing, commenting on, and debugging games. However, they are also aware that they could enable cheating: "As AI models becomes more adept at understanding game contents, there is a risk that they could be used to create sophisticated cheating tools." Many such tools already exist, but models like VideoGameBunny could open up new use cases.