Using Google Bard as an example, guest contributor Aditya Anil explains how Daniel Kahneman's ideas can help build better chatbots.
"Thinking, Fast and Slow" is a New York Times best-seller book by psychologist and Nobel Laureate Daniel Kahneman. The book presented his hypothesis on how and what drives our thinking.
This hypothesis of his is currently being harnessed by AI chatbots like Google’s Bard to make themselves more efficient and accurate.
But how exactly does the hypothesis by Daniel Kahneman covered in the book, help develop AI chatbots?
The Two Systems That Drive Thinking
Kaheman’s book explores two systems of thinking -
- intuition-based thinking (which is referred to as System 1 thinking), and
- slow thinking (which is referred to as System 2 thinking).
System 1 according to Kaheman is fast, instinct based and emotional; while System 2 is slow, deliberative and logical. While both systems play a crucial role in decision-making, one system tends to be more active than the other depending on the situation.
System 1 operates quickly and effortlessly. Action under this system takes little or no effort with no sense of voluntary control.
This includes actions like reading words on a poster, detecting if an object is far or near with respect to another object, identifying a sound that you hear, and so on.
System 2 on the other hand is more conscious and logical. Actions under this system take a long time, with voluntary controls. This system is activated when you perform abstract and logical thinking.
This includes actions like identifying someone in a crowd, doing long calculations in your head, playing chess, and so on.
Recently, the concept of two systems is being used by Bard (the AI chatbot by Google) to improve its mathematical and string operations, making its response more dynamic and accurate.
But how Bard uses this psychological concept for enhancing its own AI system?
How the Principles of Thinking Help AI
Before we dive right in, let us understand the main advantages and disadvantages of each system.
The book points out that System 1 thinking is responsible for 98% of all our thinking, while System 2 thinking is responsible for the remaining 2% and is a slave to System 1.
But both systems have its advantages and disadvantages and it influences heavily on our decision-making capabilities.
Disadvantages of Each System
Relying too heavily on System 1 thinking can lead to biases and mistakes. Some of the caveats of System 1 thinking are the following:
- Heavy indulgent of Confirmation bias
- Tendency to Overlook concrete and important details
- Ignoring evidence we dislike, that leads to ignorance
- Overthinking seemingly simple or irrelevant decisions
- Producing questionable justifications for bad decisions
and so on.
On the other hand, relying too much on System 2 thinking can also lead to mistakes and negative consequences. These include:
- Overthinking simple decisions and wasting huge amount of time
- Inability to make quick decisions
- Being too sceptical and withholding judgment too much
- Experiencing decision fatigue and cognitive overload
- Making decisions that are too logical and not taking into account emotions
Two Systems of Thinking: Applied to AI
While in the human domain, this is highly psychological, things get pretty interesting when this concept is applied to AI and Computing.
LLMs (AI model that powers chatbots like Bard and CHatGPT) can be thought of as running in System 1.
LLMs (the AI models that run these chatbots) work by finding patterns in the billions of training data that it was trained on before, and it generates a response that matches the common pattern. For example, when you say a chatbot to “write an essay on Climate Change”, the following is the process in the backend -
- Find matching queries in its vast training database. The chatbot tries to find a common query that includes the keywords “climate change” and “essays”.
- Finding a trend or pattern. The chatbot then tries to find a common trend or pattern among all selected data. For example, the pattern could be that almost all of the data has to mention ‘carbon emissions’, ‘carbon footprinting’, ‘plastic pollution’, ‘global warming’, and so on. Moreover, the title-and-paragraph formats of essays are also a pattern in itself ( in contrast to other formats like poems, blogs, etc.).
- Generating a text according to the pattern. This is the fun part. Think of this process as solving a jigsaw puzzle.
The bot tries to generate the text with using the bits of data (the puzzle pieces) and tries to make it resemble the pattern of a similar essay (the final pictures), which in this case is an essay on climate change. It creates several iterations (i.e. outputs) of the prompt that you gave and compares it to reference data, which could be an already-written essay on climate change.
- Giving the output. The iteration that closely matches the desired result is chosen, and printed on the screen.
This process may look lengthy, but it takes only a couple of seconds to perform in traditional LLMs. The 1st step is done way before in the developing and training phase of an LLMs, which consists of the AI model being trained on datasets containing billions of data. After learning from this massive dataset, and finding the pattern in all of them, the chunky and hard part of the LLM process is done.
The rest of the step is rather quick, owing largely to the quality of the data on which the model was trained. Generally, the better the training data supplied, the better the predictions and generation.
Thus LLM generates texts effortlessly without ‘thinking’ much. It just finds the pattern and compares the output to the reference.
Therefore, LLMs lie in System 1 - which is quick and efficient. However, the drawback to this is that LLMs can generate incorrect and biased outputs, and even make up their own facts and figures (AI hallucination).
This is the reason for the following case, where sometimes the Bard shows effortless result for hard prompts, but fail miserably in the easy tasks like the one below -
This is because solving a certain maths problem is efficient when you follow a specific sequence of steps, rather than relying on ‘patterns’ of similar math problems.
This is where Traditional computing works better. For instance, the way how the calculators in your computer work.
Traditional computing follows a sequence or a structure, which is in the form of code or a simple algorithm. With this respect, traditional computing is preferred to do tasks such as doing mathematical problems, manipulating string operations, doing conversions and so on. The drawback is that since it follows a specified format, it may not be necessarily fast, or efficient most of the time. The traditional computer can find the answer to the questions like 12*24 = 288 but takes a longer time to do calculus-related questions.
However, the plus point here is that it is almost certain to get the right answer most of the time.
Observe that Traditional computing is rather slow, more logical and structured when compared to LLMs
Thus Traditional computation falls under system 2. It’s relatively slow, much more systematic and logical. It consists of an algorithm, code or any other hard-coded execution system.
Google’s Bard quite interestingly is trying to use both systems to make their chatbot’s response more optimum.
How Bard uses it
Bard had a tough start when it was launched. The initial promotional video that showcased Bard’s capability faced huge criticism after the response consisted of misinformation.
Not to be a ~well, actually~ jerk, and I'm sure Bard will be impressive, but for the record: JWST did not take "the very first image of a planet outside our solar system".
— Grant Tremblay (@astrogrant) February 7, 2023
Hence, it was important for Bard to make their AI bot more accurate, consisting of less bias or misinformation. This is a challenging goal to reduce misinformation and increase efficiency in almost all AI tools out there.
Thus, owing to this, Google released a blog on June 7th under the title - “Bard is getting better at logic and reasoning”.
The blog highlighted two new features in Bard.
One of them was the export to Google Sheets feature, which allows user to export their output containing tables into Google Sheets.
The other feature allowed Bard to - in their own words - “get better at mathematical tasks, coding questions and string manipulation”
Bard earlier struggled with maths problems, and it still does every now and then. But using the approach of combining the two Systems that I mentioned above, Bard aims to get better now, correcting its silly maths errors.
This new technique that Bard uses is called the ‘implicit code execution’.
While the LLMs (System 1, consisting of fast and pattern-based responses) receive the prompt, implicit code execution allows Bard to detect computational prompts (System 2, consisting of logic and systematic execution) and run code in the background.
This helps Bard to give responses to mathematical and string-based prompts a lot easier.
In the example mentioned the blog, Google said Bard will get better at answering prompts like:
- What are the prime factors of 15683615?
- Calculate the growth rate of my savings
- Reverse the word “Lollipop” for me
The following extracts from the blog capture the essence and motivation of using this approach (of using the two systems of thinking approach) -
“As a result, they’ve been extremely capable on language and creative tasks, but weaker in areas like reasoning and math.
In order to help solve more complex problems with advanced reasoning and logic capabilities, relying solely on LLM output isn’t enough.
LLMs can be thought of as operating purely under System 1 — producing text quickly but without deep thought …Traditional computation closely aligns with System 2 thinking: It’s formulaic and inflexible, but the right sequence of steps can produce impressive results, such as solutions to long division.”
— Google in its blog
This approach of keeping LLMs and Traditional computing at System 1 and System 2 respectively ensures that the response is much more accurate and efficient.
Using this approach, Bard - according to the blog - showed a nearly 30% accuracy boost in dealing with word and math problems.
How Reliable is this New Approach
While this does improve Bard’s accuracy while dealing with mathematical and word problems, it may not be the best approach to make the chatbot efficient.
While it does shows significant accuracy while dealing with maths and word problems, it still struggles while dealing with code-related problems.
"Even with these improvements, Bard won’t always get it right — for example, Bard might not generate code to help the prompt response, the code it generates might be wrong or Bard may not include the executed code in its response," Google says at the end of the blog.
Thus, while this is a significant change, Bard still has to cover longer miles to be fully reliable.
Reducing misinformation and increasing efficiency are the challenges for nearly all the chatbots out there.
While progress is being made, there is still a long way to go.