Microsoft study shows AI copilot development can be overwhelming

Midjourney prompted by THE DECODER

A study by researchers at Microsoft Research has identified problems developers face when building AI copilots.

As more companies deploy AI copilots powered by large language models (LLMs) to assist users with tasks in applications such as Word, Excel, programming, and image and video creation, software developers are entering uncharted territory in integrating these AI technologies.

Microsoft researchers interviewed 26 professional software developers responsible for copilot development at a variety of companies. Their key finding is that development processes and tools have not kept pace with the challenges and scope of AI application development.

The development process for a copilot follows a rough sequence of exploration, implementation, evaluation, and productization. But because of the unpredictable nature of AI, the process is "messy and iterative," the researchers say.

The "messy and iterative" development process for AI copilots. | Image: Henley et al.

Developers must identify relevant use cases, assess feasibility with different technologies, and ultimately deliver a product to real users - each of which presents its own set of challenges. The study categorized the pain points into six areas:

1. Prompt engineering is time-consuming and requires extensive trial and error to balance context and token count. It is "more art than science" and models are "very fragile" with "a million ways you can effect it."

2. Orchestrating multiple data sources and prompts to understand user intent and control workflows is complex and error-prone.

3. Testing is crucial but tedious because of LLM unpredictability. Developers run tests repeatedly looking for matches or creating expensive benchmarks.

4. There are no best practices for working with LLMs. Developers rely on Twitter and papers in a rapidly evolving field that demands constant rethinking.

Recommendation

AI in practice

Update

OpenAI adds web search to ChatGPT free for all, and may just kill the WWW as we know it

5. Security, privacy, and compliance require guardrails, but telemetry data collection is restricted for privacy. Security reviews become laborious.

6. The developer experience suffers from inadequate tools and integration issues. Developers must continuously learn new tools instead of focusing on customer problems.

Focus group sessions identified potential improvements in future tools and processes, such as better support for writing, validating, and debugging prompts; more user transparency and control; automated measurement procedures; rapid prototyping options; and easy integration into existing code.

The study highlights the significant disruption caused by generative AI in software products, bringing both opportunities and uncertainties. Software development may need to be rethought due to rapidly evolving models.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

"The proliferation of product copilots, driven by advancements in LLMs, has strained existing software engineering processes and tools, leaving software engineers improvising new development practices," the researchers conclude.

Microsoft study shows AI copilot development can be overwhelming

OpenAI adds web search to ChatGPT free for all, and may just kill the WWW as we know it

Perplexity's valuation soared to $18 billion after its latest funding round

OpenAI CEO Sam Altman warns users not to trust ChatGPT agent with sensitive or personal data

Anthropic appears to tighten the usage limits for Claude code

OpenAI launches new ChatGPT agent that automates complex tasks for Pro, Plus, and Team

Kimi-K2 is the next open-weight AI milestone from China after Deepseek

New Energy-Based Transformer architecture aims to bring better "System 2 thinking" to AI models

Microsoft study shows AI copilot development can be overwhelming

Share

Bank details