Do chatbots Halucinate?

shivak singh
3 min readJul 11, 2023
Photo by Sanket Mishra on Unsplash

Emails were invented to reduce travel time and expenses and enable teams working across different time zones to exchange information synchronously. Unlike telephone conference calls that require all participants to be awake simultaneously, managing people across multiple time zones was challenging. Emails provided the ability to read and respond to information when individuals were awake, making communication somewhat independent of time zones.

Initially, the inventors of email believed that it would reduce the need for conference calls and travel, as it allowed for efficient communication. However, a few years later, they observed that travel budgets had increased. This was because teams were able to travel longer distances and meet more stakeholders, thanks to improved productivity. It may seem odd to discuss email in the age of artificial intelligence and ChatGPT, but it serves as an example of a disruptive technology in our society. Many lessons from the past can be applied to the current generative AI revolution, as employees are now contemplating whether this new disruption is a threat or an opportunity.

Artificial intelligence (AI) had its beginnings in the 1960s, with organizations like ARPA (Advanced Research Project Agency) investing in and promoting machine learning. Large language models such as ChatGPT and Bard have created a buzz because they have been trained on vast amounts of text from online sources. Human discourse models built by OpenAI and Google use this ocean of online content to answer our prompts. For example, when prompted to generate a biography about TAFE’s managing director the model can produce well-worded responses, but they may be factually incorrect. The internet is filled with numerous examples that highlight this limitation of AI-based chatbots.

The modus operandi of these language models becomes clearer when we examine their inner workings. They extract phrases (tokens) from text and determine the probability of a token appearing within a certain proximity to another token. When generating text, they use these probabilities to determine the next likely token in the sentence. Since they draw tokens from web pages across the internet, they might inadvertently pick up a token from someone else’s biography, which can confuse the chat model regarding the subject of the biography. This phenomenon is commonly referred to as “hallucination” among technical experts.

Drawing an analogy from Sigmund Freud’s psychoanalytic theory, which posits the existence of an id and ego within the human brain, we can say that when the id and ego become uncontrollable, the superego step’s in to manage such situations. However, we do not yet have an artificial superego to handle chatbot hallucinations. While chatbots can produce many useful outputs, including software, the challenge arises when evaluating the safety, security, and bug-free nature of AI-generated code. For now, humans have to act as the superego. So, while there are concerns about chatbots taking over all the work, the reality is that employees will be occupied with verifying and refining the content generated by bots.

Assessing the factual accuracy of chatbot output is no easy task. In a recent and somewhat amusing case, a US lawyer used a chatbot to generate arguments and references for a particular case in a legal pleading. However, the chatbot invented references that didn’t exist, which did not please the judge. The bot generated seemingly reasonable checks and references that turned out to be non-factual. Currently, we lack mechanisms to readily determine the accuracy of a chatbot output. The amusing yet dangerous aspect of this situation is that AI models are capable of producing very persuasive-sounding arguments, even when they are incorrect.

If we plan to use chatbots for amusement purposes, such as requesting ChatGPT to write a story about how I met an alien, the responses can be quite entertaining. However, if we rely on chatbots for financial, medical, treatment, or dietary advice, it is strongly advised against doing so. Such high-risk areas require the deployment of more sophisticated tools for scrutiny and regulation.

If you enjoyed this article help others find it by holding the 👏 button until the heavens drop. You can give up to 50 👏

--

--