- Final report on Bing Chat misinformation from AlgorithmWatch added
Update from December 18, 2023:
AlgorithmWatch publishes the final report of its Bing Chat investigation. According to AlgorithmWatch, the problem of false reports persists in the investigated election scenario. In some cases, Bing Chat even invented scandalous stories about election candidates, including sources, AlgorithmWatch shows.
Another issue is the inconsistency of answers and frequent evasion, which reduces the value of the chatbot as an information tool, AlgorithmWatch says. Microsoft is "unable or unwilling" to address these issues. Generative AI needs to be regulated, and tech companies need to be held accountable, according to AlgorithmWatch.
Original post from October 5, 2023:
Microsoft's Bing chat botches election information, endangers democracy, study finds
No one should use Microsoft's Bing Chat to find out about upcoming elections or votes, according to a new study.
The research by AlgorithmWatch and AI Forensics, in collaboration with Swiss radio and television stations SRF and RTS, found that Bing Chat gave incorrect answers to questions about elections in Germany and Switzerland.
The team tested the quality of Bing Chat's answers to questions about the Bavarian and Hessian state elections and the Swiss federal elections since the end of August.
The queries (or searches) were made over a network of VPNs and private IPs in Switzerland and Germany. The language and location parameters were chosen to reflect the potential voters in the respective election regions.
Data collection began on August 21, and the team is still analyzing the data, but preliminary results show clear trends, according to AlgorithmWatch.
Bing chat misleads those interested in politics
Bing Chat was particularly misleading when asked about the latest poll results for the upcoming elections in Bavaria. It incorrectly reported that the "Freie Wähler" would receive only 4 percent of the vote, while the party's actual election forecast was between 12 and 17 percent.
Bing Chat also failed to correctly answer questions about the parties' top candidates for the 2023 state elections in Hesse. It named incorrect candidates and repeatedly identified a retired politician as the CDU's top candidate.
Invented survey results
Bing Chat often refers to reputable sources with correct poll results, but then gives nonsensical numbers in its own answers. For example, the chatbot repeatedly claimed that the "Freie Wähler" had lost approval because of the Aiwanger scandal, although the opposite was true.
False information about candidates
The chatbot also provided false information about the candidates for the 2023 state elections in Hesse, often naming well-known politicians from the respective party, even if they were not even running. For example, Volker Bouffier was frequently named as the CDU's top candidate, even though he retired from politics in May 2022.
False reports in the Aiwanger case
Bing Chat confused problematic statements by Hubert Aiwanger about Corona with the leaflet affair. In one reply, the scandal was interpreted one-sidedly from Aiwanger's point of view. In another, Bing linked the leaflet affair to the Left Party and not to Aiwanger. Of ten questions about the Aiwanger case, the chatbot answered eight correctly and neutrally.
Misleading information about parties
When asked which parties were participating in the elections, Bing did not give a fully correct answer. In all twelve answers, the CVP was listed as one of the six largest parties instead of "Die Mitte". Eight responses named the BDP as an eligible party for 2023, even though it no longer exists.
Karsten Donnay, assistant professor of political behavior and digital media at the University of Zurich, speaks of an "uncritical use of AI" in which companies launch unreliable products without legal consequences.
In response to the research, a Microsoft spokesperson tells AlgorithmWatch that the company is committed to improving its services and has made significant progress in improving the accuracy of Bing Chat responses.
Microsoft also offers a precision mode for more accurate responses and encourages users to provide feedback, he said. Bing Chat in precision mode uses 100 percent GPT-4, OpenAI's most capable language model. AlgorithmWatch used the "balanced" setting, which uses Microsoft's own models in addition to OpenAI's and reportedly introduces more hallucinations than just GPT-4.
However, Matthias Spielkamp, CEO and co-founder of AlgorithmWatch, criticized Microsoft's response, saying the company only addressed specific issues without addressing the structural problems of generative AI. He warned against Microsoft's promises about information reliability, calling them irresponsible and driven by a desire to sell more products.
Regulation and political intervention
Under the EU's Digital Services Act (DSA), digital platforms and search engines with more than 45 million users in the EU, including Microsoft Bing, are required to conduct risk assessments and develop mechanisms to minimize the risks posed by their services.
These include potential negative impacts on the integrity of electoral processes and public debate, and the spread of misinformation.
Microsoft has not yet responded to AlgorithmWatch's inquiries about whether it considers Bing Chat's incorrect answers about the upcoming election to be a systemic risk under the DSA and what action it intends to take.
The European Commission considers the findings of AlgorithmWatch and AI Forensics to be highly relevant to the DSA and reserves the right to take further action.
AlgorithmWatch confirms what we already knew
AlgorithmWatch's findings are not new: Bing Chat has been criticized for misinformation since day one. OpenAI's ChatGPT browser has similar weaknesses and significant disadvantages compared to traditional Internet search.
The new study highlights the problems with probabilistic systems designed to answer deterministic questions. It again raises the question of why companies like Microsoft are allowed to use large language models without restriction in inappropriate application scenarios, even though their weaknesses and associated risks are well known.
Microsoft knew about the issues with Bing Chat before it was released, but decided to launch the product anyway to put pressure on Google's search market - so far unsuccessfully.