From ChatGPT’s confident falsehoods to Whisper’s phantom words, the article argues that AI errors may offer a powerful mirror for how predictive systems, human or artificial, construct meaning when information is incomplete.

Perspective: Does ChatGPT need a psychiatrist? Similarities between human psychopathology and errors in large language models. Image Credit: Summit Art Creations / Shutterstock
A recent Perspective article published in the journal NPP–Digital Psychiatry and Neuroscience compared errors in artificial intelligence (AI) systems with confabulations and hallucinations observed in psychiatry.
Large language models (LLMs) and speech recognition tools have gained widespread popularity, transforming education, business, healthcare, and research. However, significant concerns remain over the tendency of these AI systems to produce misinformation. For instance, ChatGPT sometimes generates text that is factually incorrect but appears plausible.
Likewise, automatic speech recognition (ASR) tools, such as Whisper, can produce severe transcription errors under some conditions, resulting in output that is unfaithful or nonsensical to the source input. Such errors are often called hallucinations, although clarification is necessary. Hallucinations, in humans, are false sensory experiences that occur in the absence of external stimuli. Meanwhile, LLM errors do not involve perception and are better described as confabulations.
In contrast, ASR errors are structurally different from LLM errors and may be more akin to hallucinations in a functional, rather than experiential, sense. Although LLM errors may appear at first glance to be software bugs, they resemble psychiatric phenomena in several ways. Outputs in artificial and biological systems can appear coherent, context-sensitive, and confident, but detached from reality. In the article, the authors explored the parallels between human psychiatric symptoms and AI model output errors.
Confabulations in AI Systems and Humans
Confabulations are false memories produced to fill memory gaps, with no intent to deceive. These fabrications appear coherent in the context of the individual’s life. They are most common in conditions associated with memory impairment, e.g., Korsakoff’s syndrome and dementia. Confabulations range from minor inaccuracies to highly detailed fabrications, highlighting the brain's role in memory construction and reconstruction.
LLMs, such as ChatGPT, exhibit similar flaws. Under certain conditions, ChatGPT tends to generate nonsensical and inaccurate information. It generates incorrect responses that appear plausible when information is missing, i.e., when there are functional “memory gaps” related to training data, parameter encoding, or context limitations rather than human-like episodic memory gaps. Vague, ambiguous, or broad prompts encourage AI models to fill in missing information based on assumptions derived from their training data.
Tricky questions and multi-step reasoning tasks also lead the model to generate logically consistent but incorrect responses. Notably, LLM and human confabulations are context-dependent. In humans, emotional states, strong beliefs, or leading questions increase the odds of filling memory gaps with inaccurate but plausible details. Similarly, GPT models can confidently generate inaccurate responses when prompted with assumptions.
The similarities between LLM and human confabulation lie in gap-filling, observable behavior, and coherence-seeking, but the mechanisms remain fundamentally distinct. LLMs lack self-modeling consciousness, executive control, or episodic memory. While earlier LLMs operated within fixed parameters and lacked persistent memory across interactions, newer models store limited user-controlled information across sessions but do not incorporate it into a continuously self-updating model.
Hallucinations in Humans and ASR Systems
ASR and human hallucinations show some superficial similarities. In Whisper, more than one-third of hallucinations are explicitly harmful, such as demographic stereotypes, physical violence/death, and sexual innuendo. Similarly, human auditory verbal hallucinations (AVHs) are characterized by often threatening and negative content. Voices in both non-clinical and clinical groups often deliver threats of violence and verbal abuse.
Further, human and ASR hallucinations show repetition. Whisper hallucinations endlessly loop phrases or words similar to human AVHs that repeat themes and wording. At a behavioral level, both ASR systems and humans appear particularly vulnerable to hallucination-like errors or experiences in the presence of degraded/weak perceptual signals. Human AVHs involve aberrant corollary discharge mechanisms, cortical-subcortical loops, and predictive processing of sensory input.
On the other hand, Whisper hallucinations reflect the completion of probabilistic patterns applied to acoustic features. These systems do not hear voices but compute odds over acoustic-text mappings. Repetitive or harmful content does not arise from misattributed inner speech, affective states, or threat processing, but from statistical regularities in the training data.
Mitigation Strategies
In psychiatry, cognitive-behavioral therapy is used to improve the ability of individuals to critically assess the validity of their hallucinations. Similar mechanisms could be integrated into AI systems. Plausibility assessment can be operationalized using uncertainty estimation methods. Moreover, internal consistency checks can be implemented to require the model to re-assess its output.
Increasing resource allocation for LLMs, including multi-pass verification or additional processing steps, can reduce error rates by enabling more thorough, slower internal assessment. Other mitigation strategies include retrieval-augmented generation, cross-model verification, multi-agent debate, semantic entropy methods, prompt design, and temperature tuning. A similar principle may apply to humans: error monitoring and cognitive performance rely on sufficient neurobiological resources, which are restored by processes such as sleep. Therefore, strengthening systemic capacity may decrease susceptibility to errors rather than targeting symptoms.
Concluding Remarks
Collectively, confabulation-like and hallucination-like errors are well-recognized problems in LLMs and LLM-dependent software, underscoring the need for continued human oversight. These errors superficially resemble pathological symptoms reported in neurology and psychiatry. Errors in human minds and machines are features of systems built for prediction and explanation. As such, comparing confabulations and hallucinations in LLMs and humans can help better understand both, provided the parallels are treated as provisional, model-dependent, and mechanistically limited.
Download your PDF copy by clicking here.
Journal reference:
- de Boer JN, Ciampelli S, Hailemariam AK, Koops S, Sommer IEC (2026). Does ChatGPT need a psychiatrist? Similarities between human psychopathology and errors in large language models. NPP–Digital Psychiatry and Neuroscience, 4(1), 12. DOI: 10.1038/s44277-026-00064-1, https://www.nature.com/articles/s44277-026-00064-1