How useful are AI tutors for learning? A study with 1,200 participants investigates whether GPT-4 can help with math education.
ChatGPT and other AI tools that use large language models are used by students and teachers. A first comprehensive study now examines whether these tools lead to learning success.
The paper "Math Education with Large Language Models: Peril or Promise?" by researchers at the University of Toronto and Microsoft Research looks specifically at the impact GPT-4 can have on learning mathematics.
The study involved nearly 1,200 people who were asked to solve mathematical problems.
GPT-4 math tutoring shows positive impact
The experiment consisted of two phases: a learning phase, in which the participants either tried the tasks themselves or saw the correct answers first, and a test phase, in which they had to solve similar tasks without help.
Participants were presented with three types of explanation: no explanation, a GPT-4-generated explanation, or a customized GPT-4 explanation with tailored problem-solving strategies.

During the testing phase, all participants were tested with new questions to check if they had understood the basic concepts from the examples.
The team found that the GPT-4 explanations improved learning outcomes compared to simply showing the correct answers. Participants who tried the problems before seeing the explanations showed the greatest learning gains. This positive effect also persisted for participants who saw the explanations before trying to solve the problems.

According to the researchers, a qualitative analysis accompanying the study showed that these performance improvements were due to participants adopting the strategies outlined in the explanations. According to the participants, the explanations also reduced the perceived difficulty of the test tasks.
Large study with major limitations
However, the researchers also point out some limitations of the study. For example, it was limited to short-term learning and focused on SAT math questions with multiple-choice answers. In addition, the study was conducted in a controlled environment via Amazon Mechanical Turk: the impact of AI explanations would need to be studied in broader educational contexts over time. They also did not test whether a group with some classical textbook explanations performed better.
Future work should therefore investigate how AI support affects learning in other learning domains, contexts, and response formats. In addition, the negative effects of over-reliance combined with potentially incorrect responses in an educational context should be investigated.
 
             
					
 
							 
					 
					 
					 
					 
					