Systematic use of GenAI cognitively overwhelms many people, Microsoft Research finds

A new Microsoft Research study shows that using generative AI systems effectively requires strong metacognitive abilities—our capacity to monitor and control our own thoughts. The researchers found that many people struggle with the cognitive demands of professional AI use and proposed specific improvements.

The professional and systematic use of generative AI systems cognitively overwhelms many people. According to a new study by Microsoft Research, this is not only due to the complexity of the systems themselves, but above all to the high demands placed on our metacognition.

"The metacognitive demands of working with GenAI systems parallel those of a manager delegating tasks to a team," the researchers write. "A manager needs to clearly understand and formulate their goals, break down those goals into communicable tasks, confidently assess the quality of the team’s output, and adjust plans accordingly along the way."

Three main cognitive challenges

First, in prompting—formulating instructions for AI. Users need to clearly identify their goals and break them into subtasks. Many aspects that remain implicit in manual work—like the desired tone of an email—must be explicitly communicated to the AI. This becomes particularly evident in systematic work with generative models, beyond the chat principle. Here, individual work steps must first be thought through mentally, put in sequence, and translated as instructions for the model.

Second, in evaluating AI outputs. Unlike search engines, GenAI responses aren't deterministic and can vary with identical queries. This requires "well-adjusted confidence" in one's evaluation abilities. Other studies show that domain expertise makes it easier to assess AI output quality quickly and accurately.

Third, in deciding whether and how to automate tasks. This demands self-awareness about AI's suitability for one's workflow and flexibility in adapting work processes.

Flussdiagramm zur Workflow-Automatisierung mit GenAI: Zeigt manuelle Arbeit, Prompt-Iteration und Evaluationsprozesse mit metacognitiven Elementen. — Metacognitive efforts through GenAI. | Image: Microsoft Research, University College London, University of Edinburgh

Practical improvements

The researchers suggest several strategies to enhance AI interactions:

Better planning through "Think Aloud": Users should verbalize or write down their thoughts while using AI. This helps clarify goals and break tasks into systematic steps.
Active self-evaluation: Take time for reflection after each AI interaction by asking:
- Was my AI instruction precise enough?
- How much time did I spend revising AI output?
- Would another approach have been more efficient?
Strategic self-management: Users should define distinct work modes:
- A "thinking mode" for careful prompt planning
- A "reflection mode" for reconsidering decisions
- An "exploration mode" for creative AI experimentation

"GenAI systems, with their model flexibility and generality, have the potential to adaptively nudge this kind of self-evaluation at key moments during user workfows, effectively acting as a coach or guide for users," the researchers write.

Interface design suggestions

The authors propose several approaches for more interactive chat interfaces to reduce users' metacognitive workload. These include integrated planning tools, self-assessment prompts, and workflow management features for platforms like ChatGPT, Microsoft Copilot, and GitHub Copilot.

Recommendation

AI in practice

OpenAI's new "Orion" model reportedly shows small gains over GPT-4

Chatfenster zeigt strukturierte Aufgabenzerlegung für Kündigungsschreiben mit Inhaltsvorgaben, Beispielen und Eingabefeldern für personalisierte Details. — Hypothetical example of a planning-oriented metacognitive intervention integrated into ChatGPT. | Image: Microsoft Research, University College London, University of Edinburgh

Zwei Screenshots der Copilot-Oberfläche zeigen Produktivitäts-Tipps: Ein Hinweis zur durchschnittlichen Bearbeitungszeit von 45 Minuten und eine Empfehlung für effizientere Zusammenfassungen. — Hypothetical example of a metacognitive intervention focusing on self-assessment integrated into Microsoft Copilot. | Image: Microsoft Research, University College London, University of Edinburgh

Python-Code-Snippet mit GitHub Copilot-Integration, selbstevaluierenden Prompts und Annotations-Overlays für Repository-API-Abfragen. — Hypothetical example of a metacognitive intervention focusing on self-management and self-assessment for coding in GitHub Copilot. | Image: Microsoft Research, University College London, University of Edinburgh

Current usage issues

Usage data reveals significant room for improvement: 26 percent of surveyed programmers avoid tools like GitHub Copilot due to disruptive AI suggestions, while 38 percent cite time-consuming debugging of generated code. Only 20 to 30 percent of Copilot suggestions are accepted by users.

The researchers note that implementing these improvements requires balancing several factors: interventions should adapt to expertise and workflow, support should gradually decrease with experience, and there must be equilibrium between assistance and cognitive load.

New ways of working

Intense planning, preparation, and segmentation of tasks before handing them over to generative AI systems represents a completely new way of working for most people. Carefully considering at which point, to what extent, and with what quality systematic prompts yield the desired results is not easy and requires guidance and training. From my own experience, I can report: Anyone who overcomes the initial mental hurdle in professional dealings with generative AI opens the door to a new world of working in tandem.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Systematic use of GenAI cognitively overwhelms many people, Microsoft Research finds

Three main cognitive challenges

Practical improvements

Interface design suggestions

OpenAI's new "Orion" model reportedly shows small gains over GPT-4

Current usage issues

New ways of working

Google launches Veo 3 Fast worldwide, letting Gemini Pro users generate videos up to 720p

Amazon's new DeepFleet mode helps robots deliver your packages even faster

Black Forest Labs opens its AI image model FLUX.1 context [dev] for private use

Cloudflare CEO Matthew Prince sees trouble ahead for the open web

New Othello experiment supports the world model hypothesis for large language models

ChatGPT might be draining your brain, MIT warns - what ‘cognitive debt’ means for you

Systematic use of GenAI cognitively overwhelms many people, Microsoft Research finds

Three main cognitive challenges

Practical improvements

Interface design suggestions

Current usage issues

New ways of working

Share

Bank details