Hi readers!
You all know Artificial Intelligence (AI) and many of you are using the tools such as ChatGPT, Gemini, Copilot etc., for writing, editing and spell check. But have you ever checked if the judgement or advise delivered by these tools is similar or different by judgment given by human beings?
Being my regular readers, you know I have blogged from A-Z of Artificial Intelligence therefore, today I am blogging on another interesting aspect of AI tools and what the people think about the outcome.
Scientific American in an issue of May 2026, published an article titled
How AI and Human Judgment Differ?
Which I have selected for this week blog to add to your knowledge the latest happening in the world of AI in easy words.
Water Quattrociocchi: the author of the article says, if you put Large Language Model (LLM) and human judgment together in a test of reasoning you will find the difference.
LLMs are advanced AI systems designed to understand, process, and generate human like text which often matches human responses but the reasoning behind these responses is incomparable to human reasoning.
She exemplified the difference by giving a classical example of a doctor who has touched the body, have seen the organ, have studied the anatomy not just by reading but through a prolonged hand on experience and training. But what if your doctor hasn’t possessed any of such qualifications except reading millions of patient reports and learned how a diagnosis typically “sounds?” The prescriptions of such doctors might be convincing nevertheless, the moment you became aware of the fact that your doctor knowledge is based on patterns in the text rather than connection between doctor and the patient, the prescription in your hands would immediately lose its integrity.
Similarly, many people turn to ChatGPT to get medical advice, legal guidance, and/or educational coaching on daily basis, knowingly that these LLMs are reproducing the world they have not been there or are producing the results that they have not generated by themself using their mind and logic but are presenting only the verbal outline.
To address this question, LLMs and people were asked to make judgments about few tests expecting that the LLM tools will not think like people, but outcome will be a valuable understanding of how such tools differ from humans and how and when such tools are to be used?
For example, LLMs and people were simultaneously asked to rate different news source and their credibility.
The human will move step by step starting from checking the headline to know something he don’t know already. He will then check the source if it has a reputation of careful reporting? followed by verifying the claim if it fits into the broader chain of events which mean asserting if the event being reported has happened and align with the way the similar situation unfolded in the past?
Contrary to this, LLMs cannot carry out these steps. For example, you asked different LLM models to evaluate the reliability of a new headline using a specific procedure. Their conclusions might be like those of human but their justifications will reflect a patterns drawn from language (such as how often a particular combination of words coincided and in what contexts) rather than references to external facts, prior events or experience, which were the factors that humans considered. The language model can reproduce this form of deliberation well. It will provide statements that will reflect the terminology of care, duty or rights. It will present causal language based on patterns in language, including “if-then” counterfactuals because the model is not imagining anything or engaging in any deliberation; it is just reproducing patterns in people’s speech or writing about these counterfactuals. The result can sound like causal reasoning, but the process behind it is pattern completion, not an understanding of how events produce outcomes in the world.
Across all the tasks that were studied a consistent pattern emerges. LLMs can often match human responses but for reasons that bear no resemblance to human reasoning because human judges, but a model correlates., a human evaluates while a model predicts., and a human engages with the world, and a model engages with distribution of the words. Their architecture makes them extraordinarily good at reproducing patterns found in text. It does not give them access to the world these words belong to. Since the human judgments are also expressed through language, the model’s answers often end up resembling human answers on the surface. This gap between what models seem to be doing and what they are in fact doing is epistemia: a situation when the simulation of knowledge becomes indistinguishable to the observer, from knowledge itself.
Epistemic reliability is a theory in philosophy holding that justified beliefs or knowledge must be produced by reliable, truth-conducive processes such as perception, memory, or good reasoning rather than through faulty logic. It focuses on whether a method consistently yields true beliefs, forming a core component of Reliabilism.
Reliabilism is a theory in epistemology proposing that knowledge and justified belief are produced by reliable cognitive processes and methods that yield a high proportion of true belief.
Most of the people are not clear about the explanation of these models because they take linguistic plausibility as a substitute for truth: an error which happens because the model is fluent, and fluency is something human readers are geared up to trust. It is subtly dangerous because the model cannot know when it is Hallucinating. Since it cannot represent truth in the first place hence it cannot form beliefs, revise them or check their output against the world. It cannot distinguish a reliable claim from an unreliable one except by analogy to prior linguistic patterns.
Hallucinating means experiencing sensory perceptions such as seeing, hearing, smelling, tasting, or feeling: things that appear real but are not actually present, often caused by mental illness, drugs, or brain dysfunction.
People are already using these systems in contexts in which it is necessary to distinguish between plausibility and truth, such as law, medicine and psychology. A model can generate a paragraph that sounds like a diagnosis, a legal analysis or a moral argument. But sound is not substance. The simulation is not the thing simulated. None of this implies that LLMs should be rejected. They are extraordinarily powerful tools when used as what they are:
LLM’s are excellent at drafting, summarizing, recombining and exploring ideas but not at judgement which is a relation between mind and the world and not between a prompt and a probability distribution.
It is imperative to seek clear understanding of what these models can and cannot do. Their smoothness is not and insight and their eloquence is not evidence of understanding. They are sophisticated linguistic instruments that require human oversight precisely because they lack access to the domain that judgment ultimately depends on the world itself.
That’s all dear readers
See you again, Take care, Bye


