The trust crisis: why black-box AI is failing the people who rely on it

1. When AI hallucination became a real-world risk

For years, AI hallucination was treated like a party trick, a demo flaw to chuckle at.

Bad poetry
Wrong trivia
Funny mistakes

→ The stakes were low, so the errors were easy to ignore.

THAT ERA IS OVER

Today, AI systems are embedded in high‑stakes workflows that shape how power, money, and information move through the world.

Governments lean on AI report generators to brief leaders and draft policy.
Courts use AI legal research assistants to surface precedent and structure arguments.
Healthcare teams rely on AI in medical diagnostics to make time‑critical decisions.
Enterprises deploy AI enterprise governance for analytics, risk, and data‑driven decision making.

→ “Mostly right” stopped being acceptable the moment AI entered law, finance, and public policy.

From now, AI errors do not fail quietly, they attempt to persuade:

Fluent language increases perceived credibility.
Confident tone substitutes for verification.
Structure and formatting suggest authority.

This is how AI misinformation spreads at scale

This is not a tooling issue. This is not a prompt issue.

→ This is a trust failure. If trust collapses, we need to understand what breaks it.

2. The lie inside the AI black box

An AI black box is an AI system that makes decisions in ways humans can’t clearly see, understand, or explain. It process by itself, then performs and people get hurt when persuasive output is treated as reality.

Large language models optimize for probability, not truth. They predict what sounds right, not what is right, baking AI errors into the system by design.

This is why LLM hallucination can look like scholarship:

Fake citations
Invented rulings
Bogus URLs
Confident delivery

→ These are not isolated errors. They reflect a default pattern: credible-sounding structure that amplifies manipulation risk.

A structural problem, not a user error

The way of AI making up fake information is the hottest topics on social media. In a widely shared thread on X, David Smerdon shows ChatGPT generating academic papers that look legitimate, but didn't actually exist.

The model is biased toward confident, plausible output. Over time, this creates an AI misinformation dynamic that users experience as persuasion rather than assistance.

How?

Language models assign probabilities to word sequences. Given a prompt, they predict what comes next and enforce internal consistency across the entire document.

→ That consistency creates credibility, as internal coherence is mistaken for external truth.

This is why AI manipulation succeeds at scale.
This is why fabricated content passes surface review.
This is why AI hallucination reads like credible scholarship.

When AI errors stop being harmless

Overconfident in their answers
Less exploratory in reasoning
More inclined to present conclusions without caveats

Another public thread extends that distrust to newer models like Gemini 3.0, describing it as dangerous and “trained to lie,” preferring polished answers over doing real work.

And more dangerously, they claim that from GPT-5 onward it has become harmful for fragile users and “should be banned for minors,” and that in 5.2 it feels “sociopathic and psychotic.”

AI hallucination isn’t just a “system error” anymore. It becomes a credibility risk:

Humans shape the model through data and reinforcement

The model then shapes human decisions through persuasive language.

Once teams start trusting the format, AI errors and AI misinformation don’t stay contained, they propagate through real work.

3. Deloitte and $290.000 lesson because of AI hallucination

In July 2025, Deloitte Australia submitted a 237-page report to the Australian Department of Employment and Workplace Relations on welfare compliance IT systems. This report cost Deloitte about $290,000 after instances of AI hallucination issues were discovered → The issue was AI hallucination

What went wrong

A Sydney University researcher, Dr. Chris Rudge, identified 20 errors, including misinformation from the original source. He said the report misquoted a court case and made up a quotation from a judge

This is a clear example of AI-generated misinformation carrying real policy and legal risk. Deloitte later published a revised version in September 2025, disclosing that Azure OpenAI GPT‑4o had been used to address “traceability and documentation gaps.”

Why this matters

We are not talking about poor prompting or misuse of models → It reflects a governance gap in how AI-assisted outputs were validated.

The case demonstrates how generative AI misinformation can closely resemble formal documentation. In this instance, hallucinated content influenced government decision-making.

The Deloitte AI error is not an isolated incident. It illustrates a broader class of failure when black-box AI systems are used in high-stakes workflows without sufficient safeguards.

4. The hidden metric no one wants to talk about

AI error rates aren't dropping to zero, but they are getting better at hiding. As models scale, outputs sound cleaner, smoother, and more authoritative. But the underlying failure mode persists. In some cases, it becomes harder to detect. The research from Chelli et al.2024 make this statement clearer

What the research shows

A 2024 peer-reviewed study by Chelli et al. in the Journal of Medical Internet Research evaluated hallucination and reference accuracy across GPT-3.5, GPT-4, and Google Bard.

The study analyzed 471 references generated from 11 real medical systematic reviews across physiotherapy, sports medicine, orthopedics, and anesthesiology.

Hallucination rates

Hallucination rates were substantial:

GPT-3.5: 39.6%
GPT-4: 28.6%
Google Bard: 91.4%

→ Even GPT-4 produced hallucinated references nearly one third of the time.

Precision was low

Only a small fraction of AI-generated references actually existed in the original reviews.

GPT-3.5 precision: 9.4%.
GPT-4 precision: 13.4%.
Google Bard precision: 0%.

→ In other words, most citations looked real but were not.

Efficiency of the tested LLM in complying to inclusion and exclusion criteria.

Efficiency of the tested LLM in generating accurate bibliographic information of the papers.

Even “correct” outputs were incomplete

Even when references existed, bibliographic accuracy remained limited. DOIs were correct in only 16 to 20 percent of cases. Dates and journal details were frequently wrong (Chelli et al., 2024).

→ These are not minor defects in high-stakes domains. Incomplete metadata breaks traceability and makes independent verification difficult.

The uncomfortable truth

Increasing model size did not eliminate LLM hallucination → it taught models to hide errors more effectively.

5. The only way forward: layered systems or no trust at all

Language models are rewarded for completion, not for withholding answers.

This explains, AI governance must be designed in layers

In high-stakes environments, a layered governance approach is required. Claims must be grounded in verifiable sources. Citations must be traceable. Reasoning must be auditable by humans across systems and revisions.

Without these layers, review relies on surface credibility rather than evidence.

Treating hallucination as a standard software defect understates the risk. A calculator produces incorrect outputs. A black-box language model can fabricate structured narratives that appear legitimate.

→ That distinction matters.

Final thesis

The trust crisis around AI reflects a mismatch between how AI systems generate outputs and how those outputs are used in practice.

Eliminating AI hallucination altogether is unrealistic, even when an LLM can "show reasoning." Hallucinations will still occur, even with grounded sources. Humanity needs to become more discerning in verifying AI outputs.

Which means:

When AI can't show its work → it can't be trusted in environments that require verification.
When AI can show its work → humans still need to verify critically.

→ Addressing this gap requires architectural change, not better prompts.

Footnotes

[1] Chelli, M., Descamps, J., Lavoué, V., Trojani, C., Azar, M., Deckert, M., Raynier, J., Clowez, G., Boileau, P., Ruetsch-Chelli, C. Hallucination Rates and Reference Accuracy of ChatGPT and Bard for Systematic Reviews: Comparative Analysis. Journal of Medical Internet Research, 2024;26

. https://www.jmir.org/2024/1/e53164

[2] Fortune. Deloitte refunds Australian government after AI-generated report contained hallucinations. October 7, 2025. https://fortune.com/2025/10/07/deloitte-ai-australia-government-report-hallucinations-technology-290000-refund/

The trust crisis: why black-box AI is failing the people who rely on it

Eliminating AI hallucination altogether is unrealistic, even when an LLM can "show reasoning." Hallucinations will still occur, even with grounded sources. Humanity needs to become more discerning in verifying AI outputs.

1. When AI hallucination became a real-world risk