AI is supposed to make it possible for non-programmers to create complex programs like games. A group of researchers put this claim to the test by generating the game Flappy Bird 35 times using Python and ChatGPT.
The results of the DiverSE research group show that AI programming is not as easy as it is sometimes made out to be. But it has potential. The experiment doesn't provide a clear answer to whether non-programmers can suddenly just generate fully functional games with AI, but it does show that the process is neither simple nor solved.
Flappy Bird "Prompt Edition"
The study focused on developing the game Flappy Bird using Python as the programming language. The researchers explored several methods using ChatGPT 3.5 and 4, testing different prompts and strategies.
- Writing a short prompt that simply describes the game,
- Giving ChatGPT a more detailed list of features,
- Providing a short description of the most important features,
- Providing a complete code example and asking for a matching prompt, and
- Using a series of prompts to generate code without reviewing the code in between.
However, the researchers did not find a magic prompt that always produced a playable game. Some trials produced playable games without additional technical intervention, but many sessions resulted in unusable games that required code fixes.
Interestingly, even with the same prompts, ChatGPT generated substantially different versions of code that produced completely different results. The team speaks of significant inconsistencies in the quality of the output, sometimes even leading to dead ends.
Amateur coding with ChatGPT? Yes, but…
The test shows that laymen with minimal technical knowledge can create games with ChatGPT, but that without programming knowledge this is hardly possible systematically. Often (but not always) direct intervention in the code was necessary to fix bugs.
If a game was in a bad state from the beginning, it was even harder to fix it without programming knowledge because of the lack of visual feedback. If you can't read code, you can only point out bugs that are visible in the game. But if the game does not work, this possibility disappears.
During the experiment, ChatGPT often split the problem and inserted placeholders into the code without updating the implementation. This is beneficial for the developers, but for the end user the game is incomplete and non-functional.
Overall, the team emphasizes the benefits of ChatGPT for programming, including inspiration for new variations of a game or unique features, the use of generated code as a starting point, and the basic ability for end users to create interesting games that sometimes work.
The team suggests further approaches, such as changing the programming language, finding better prompts or a language to better control ChatGPT, and improving the integration of ChatGPT output into the development environment to save time.
All 35 sessions with prompts, code, observations, and results using GPT-3.5 and GPT-4 via ChatGPT are available on Github. The video below shows some of the experiments and their results.