We Fact-Checked 6 AI Presentation Makers. Here's How Often They Hallucinate.

We tested Gamma, Beautiful.ai, Canva, Tome, Kimi, and LayerProof with the same prompt, then verified every claim. Accuracy ranged from 0% to 44%. See the full results.

We Fact-Checked 6 AI Presentation Makers. Here's How Often They Hallucinate.

We tested Gamma, Beautiful.ai, Canva, Tome, Kimi, and LayerProof with the same prompt, then verified every claim. The best scored 44%. The worst scored zero.

We gave six AI presentation makers the same prompt. Then we checked every single claim they made against primary sources. The results were worse than we expected.

Not zero percent accuracy on hard claims. Zero percent accuracy on all claims. It didn't get a single fact right.

Here's the full breakdown of what we found, how we tested, and why this should change the way you think about AI-produced presentations.

AI Presentation Hallucination: Why It Matters

AI presentation tools are showing up everywhere. Sales teams use them for pitch decks. Consultants use them for client deliverables. Teachers use them in classrooms. Founders use them to raise money.

And almost nobody is checking whether the claims in those presentations are true.

That's a problem. A pitch deck with a fabricated market size could mislead investors. A classroom presentation with distorted statistics could misinform students. A consulting deliverable with unverifiable claims could damage a firm's credibility.

The implicit promise of these tools is that they'll produce something useful, something you can present with confidence. But confidence without accuracy is just well-designed misinformation.

We wanted to know: how accurate are these tools, really? Not based on vibes or anecdotes. Based on data.

How We Tested

We kept the methodology simple and repeatable.

The prompt: We gave every tool the same instruction: "Create a 10-slide presentation about the impact of AI on education in 2025. Include statistics, trends, and real-world examples."

We chose this topic deliberately. AI in education is well-documented. There are real statistics from credible organizations (UNESCO, OECD, the U.S. Department of Education). There are real products with real user numbers. There are real policy developments with public records. If a tool is going to get anything right, it should be able to get this right.

The tools: We tested six: Kimi, Gamma, Beautiful.ai, Canva, Tome (via ppt.ai), and LayerProof. We ran each tool once with the same prompt. No follow-up instructions, no regeneration, no cherry-picking.

Claim extraction: We went through every slide and pulled out every factual claim. A factual claim is anything that can be verified or falsified: a statistic, a named example, a market size, a policy reference, a quoted figure. Opinions and predictions were excluded.

The number of claims varied significantly by tool. Some tools produced dense, stat-heavy presentations (30 claims in one case). Others produced vague, placeholder-heavy slides with only 4 checkable facts.

Classification: Each claim was placed into one of four categories:

  • Verified: The claim matches a credible primary source. The number is right, the context is right, the attribution is right.
  • Distorted: The claim is based on something real, but the numbers or context have been changed. For example, citing 83% teacher adoption when the real figure is 60%. The kernel of truth makes these especially dangerous because they feel right.
  • Fabricated: No credible source exists for this claim. The tool invented it. This includes fake quotes, made-up statistics, and fictional case studies.
  • Unverifiable: The claim might be true, but we couldn't find any source to confirm or deny it. These are claims that exist in a gray zone, often because they're too vague to check or because they cite figures that don't appear anywhere in the public record.

Accuracy rate: We calculated accuracy as verified claims divided by total claims checked. Only verified claims count as accurate. Distorted, fabricated, and unverifiable claims all count against the tool.

This is a strict standard. You could argue that distorted claims deserve partial credit, since they're at least based on reality. We disagree. If you tell your board the market is $15 billion and it's actually $7 billion, "well, there is a market" doesn't help you.

Which AI Presentation Maker Is Most Accurate?

ToolClaims CheckedVerifiedDistortedFabricatedUnverifiableAccuracy
Kimi9430244%
LayerProof3013131343%
Gamma15381320%
Beautiful.ai6121217%
Canva6122117%
Tome/ppt.ai400310%
AI Presentation Accuracy Scorecard
AI Presentation Accuracy Scorecard

No AI presentation maker we tested verified more than 44% of its own claims.

Three things stand out immediately.

First, no tool broke 50%. The best performer verified fewer than half its claims. That means even in the best case, you're presenting a deck where the majority of your facts are wrong, twisted, or uncheckable.

Second, the range is enormous. There's a 44-point gap between the top and bottom performers. These tools are not interchangeable when it comes to factual reliability.

Third, the number of claims varies wildly. Kimi made only 9 checkable claims across 10 slides. LayerProof made 30. This matters because a tool that makes fewer claims has fewer opportunities to be wrong, but also gives you less useful information. A 44% accuracy rate on 9 claims means 4 correct facts. A 43% accuracy rate on 30 claims means 13 correct facts. The density of verifiable information is part of the picture.

Now let's look at each tool in detail, starting from the bottom.

Tool-by-Tool Breakdown

Tome (via ppt.ai): 0% Accuracy

Tome's output wasn't just inaccurate. It was barely a presentation. The tool produced slides filled with placeholder text where real examples should have been: "Platform X," "App Y," "Tool Z."

Tome slide showing Platform X placeholder instead of real examples
These aren't simplifications or generalizations. They're unfilled fields.

It got worse. The metadata on the slides included "Reporter: XXX" and "Date: 20XX-XX-XX." The tool didn't even attempt to populate its own structural fields, let alone produce real facts.

Of the 4 checkable claims Tome made, 3 were fabricated and 1 was unverifiable. None were verified. None were even distorted versions of real data. The tool simply made things up or left blanks.

There's an argument that Tome's approach is actually less harmful than some competitors, precisely because the placeholders are so obvious. Nobody is going to present a slide that says "Platform X" without noticing. But that's a low bar: "at least the errors are visible." And it raises the question of what you're actually paying for. If the tool can't produce real content on a well-documented topic like AI in education, when can it?

Beautiful.ai: 17% Accuracy

Beautiful.ai made 6 checkable claims. One was verified. Two were distorted. One was fabricated. Two were unverifiable.

The most striking error was the market size. Beautiful.ai claimed the AI-in-education market is worth $15.2 billion.

Beautiful.ai slide showing $15.2B market size — more than double the actual figure
The actual consensus across market research firms, and across the other tools we tested, is roughly $6 to $7 billion. Beautiful.ai didn't just get the number wrong. It more than doubled it.

This is the kind of error that matters in a business context. Imagine a startup founder using this figure in a pitch deck. An investor who knows the real number would immediately question the founder's research. An investor who doesn't know would be making decisions based on a market that's half the stated size.

Beautiful.ai also fabricated an inspirational quote, presented as authoritative wisdom with no traceable source. Fabricated quotes are particularly insidious because they're nearly impossible to catch without deliberate fact-checking. They read as authoritative. They feel true. And they're completely made up.

With only 6 claims total, Beautiful.ai's output was also thin on substance. The presentation looked polished, as you'd expect from a design-focused tool, but the factual foundation underneath the design was largely hollow.

Canva: ~17% Accuracy

Canva's output presented a unique problem. Like Beautiful.ai, it produced roughly 6 checkable claims with approximately 1 verified. But Canva's most concerning error was qualitatively different from the others.

Canva cited "Summit Academy" as a case study, complete with specific statistics about outcomes.

Canva slide attributing fabricated statistics to real school Summit Academy
Summit Academy is a real school. The statistics attached to it were not real. Canva fabricated data and pinned it to an actual institution.

When an AI presentation tool attaches fake statistics to a real school's name, it creates a type of misinformation that's almost impossible to catch without deliberate fact-checking.

Think about why this is worse than Tome's placeholder approach. If a tool writes "School X saw a 30% improvement," you know immediately that it's a placeholder. You'd never present that without filling in a real example. But if a tool writes "Summit Academy saw a 30% improvement," you might assume the tool looked it up. The specificity creates false confidence.

This pattern, real names paired with fabricated data, is arguably the most dangerous failure mode in AI-produced content. It passes the sniff test. It sounds researched. And it's wrong.

Gamma: 20% Accuracy

Gamma sits in the middle of the pack, and its error pattern was the most consistent: distortion. Of 15 checkable claims, 8 were distorted. That's more than half its output.

Gamma's characteristic failure was taking real statistics and inflating them. The most egregious example: Gamma cited 83% teacher adoption of AI tools in education.

Gamma slide showing inflated 83 percent teacher adoption statistic
The actual figure, from surveys conducted by credible organizations, is closer to 60%. That's not a rounding error. That's a 23-percentage-point inflation of a real statistic.

This pattern repeated across Gamma's output. The tool seemed to have access to legitimate data, or at least to the general landscape of real statistics, but consistently pushed numbers higher. Whether this is a systematic bias in its training data or a tendency to select the most impressive-sounding version of a stat, the result is the same: a presentation that feels well-researched but overstates nearly everything.

Gamma also fabricated 1 claim and left 3 unverifiable. But the distortion pattern is what defines it. If you use Gamma, assume the statistics are directionally correct but numerically inflated. That's a strange kind of trust to have to extend to a tool.

Kimi: 44% Accuracy

Kimi produced the highest accuracy rate in our test, verifying 4 of 9 claims. It also had zero fabrications, the only tool to avoid making things up entirely. LayerProof came closest with just one fabrication out of 30 claims.

Kimi's errors fell into two categories: 3 distorted claims and 2 unverifiable ones. The distortions followed the same pattern seen in Gamma, real data with inflated or shifted numbers, but less aggressively so.

What's notable about Kimi is its restraint. With only 9 claims across 10 slides, it made fewer assertions than any other tool. That conservatism paid off in accuracy rate. But it also means the presentation contained less information overall. Whether that trade-off is worthwhile depends on your use case. A deck with 4 correct facts and 5 wrong ones isn't great, but it's meaningfully better than a deck with 1 correct fact and 5 wrong ones.

Kimi AI interface showing generated presentation with statistics
Kimi AI interface showing generated presentation with statistics

Kimi did not provide sources for any of its claims. Every number was presented as settled fact, with no way for the reader to verify anything without doing their own research.

LayerProof: 43% Accuracy

LayerProof's accuracy rate was 43%, one point behind Kimi. In raw terms, that's 13 verified claims out of 30. It also had 13 distorted claims, 1 fabricated claim, and 3 unverifiable ones.

Those numbers don't paint a flattering picture, and we're not going to pretend they do. A 43% accuracy rate means more than half the claims in the presentation needed correction. The single fabrication, while not ideal, puts LayerProof in the company of most tools tested rather than above them.

Two things distinguish LayerProof's output from the rest, and neither of them is raw accuracy.

First, volume. LayerProof produced 30 checkable claims, more than three times Kimi's 9. That's 13 correct facts versus 4. If you're building a presentation you plan to fact-check yourself, starting with 13 correct claims gives you a stronger foundation than starting with 4.

Second, and more importantly: sources. LayerProof was the only tool that provided URLs for its claims. Every statistic came with a link you could click to verify it. It was also the only tool that acknowledged uncertainty in its own data, presenting the AI-in-education market size as "between $6.90B and $8.30B" rather than a single definitive number.

That distinction matters more than the accuracy rate itself. We'll come back to why.

The Zombie Stat Problem

Here's the finding that unsettled us the most.

The claim that AI tools produce "54% higher test scores" appeared in multiple tools' outputs. It sounds specific. It sounds research-backed. It has the cadence of a real finding. We went looking for the primary source.

We couldn't find one.

The stat appears on dozens of websites, blogs, and marketing pages. It's been cited in articles, infographics, and, apparently, AI training data. But none of these citations link back to an original study. There's no research paper. No named researcher. No institution. No methodology. No sample size. No year.

It's a zombie stat: a number that circulates endlessly through the internet, getting cited by sources that cite other sources that cite other sources, all the way down to nothing. It feels true because it's everywhere. It's everywhere because AI tools trained on web data absorb it and reproduce it. And now those AI tools are putting it in presentations that humans will present to other humans as fact.

The "44% time savings" figure follows the same pattern. Widely cited. No traceable origin. Likely absorbed into AI training data from marketing copy and recycled as if it were research.

Zombie stats are AI hallucination's most dangerous output: fabricated numbers that look real because they appear everywhere.

This is a feedback loop. Bad data enters the training corpus. AI tools reproduce it. Humans present it. It enters more web pages. Future AI tools train on those pages. The stat becomes more entrenched with every cycle, and further from any original evidence, if original evidence ever existed at all.

Zombie stats aren't unique to AI. They've circulated in business and education contexts for years. But AI presentation tools accelerate the cycle dramatically. A human researcher might encounter a suspicious stat and pause. An AI tool will drop it into a slide without hesitation, formatted beautifully, with a confident tone. The presentation layer adds credibility that the underlying data doesn't deserve.

The Citation Gap

Every tool except one presented its claims as bare assertions. "The AI-in-education market is worth $7.57 billion." Full stop. No source. No link. No way to check.

Only LayerProof attached source URLs to its claims.

LayerProof speaker notes showing source URLs for every claim
Only LayerProof presented a range for a contested number instead of a single figure.

Only one of the six AI presentation tools we tested provided source citations for its claims. The other five presented every statistic as settled fact with no way to check.

This difference might seem minor. It's not. It's the difference between a tool that asks you to trust it and a tool that lets you verify it.

Consider the market size example. Four tools cited a figure for the AI-in-education market:

  • Beautiful.ai: $15.2B
  • Gamma: $7.57B
  • Kimi: $7.57B
  • LayerProof: $6.90B to $8.30B

Without sources, how would you know Beautiful.ai's number is wrong? It's presented with the same confidence as the others. It appears on a well-designed slide. There's no asterisk, no footnote, no "according to." Just a number.

With a source link, you can check. You can see that the cited research firm says $7 billion, not $15 billion. You can assess the methodology. You can decide whether the source is credible. You can update the number when new data comes out.

Citations don't make a tool more accurate. LayerProof's accuracy rate proves that. It scored 43%, essentially tied with Kimi at 44%. But citations make a tool more useful, because they turn every claim into something you can act on rather than something you have to take on faith.

The real question isn't "which tool gets the most facts right?" It's "which tool lets me figure out what's right and what isn't?"

How to Spot Bad Data in AI-Generated Presentations

If you're using AI presentation tools, here's what our data suggests.

Assume the statistics are wrong. Across all six tools, statistical claims were the least reliable category. When a tool says "83% of teachers use AI," your default assumption should be skepticism, not trust. Round percentages and large growth figures are the biggest red flags.

Trust the named examples more than the numbers. When tools cited specific products, like Khan Academy, Duolingo, or Gradescope, they were usually correct. These are verifiable, well-known entities with public track records. The tools got them right because there's abundant, consistent training data about them. Statistics, by contrast, vary across sources, get distorted in transit, and are hard for AI models to pin down.

Watch for real names with fake data. Canva's "Summit Academy" example is the template for a new kind of error. The name is real. The stats are not. This is harder to catch than an obviously fake placeholder because it passes a surface-level check. If a presentation cites a specific institution with specific numbers, verify the numbers independently.

Policy claims were reliable. When tools cited government actions, like the White House Executive Order on AI, state legislative bills, or NSF Act provisions, those claims checked out. Policy actions create clear public records that AI models can reference accurately. If you see a policy claim in an AI-produced deck, it's probably correct. But still check.

Demand sources. If a tool doesn't tell you where a number came from, treat that number as unverified. This isn't paranoia. Our data shows that unsourced claims in AI presentations are wrong more often than they're right. A beautiful slide with a confident-looking statistic and no citation is just an opinion with good typography.

Be especially skeptical of statistics that appear in multiple tools. You might think convergence equals reliability, that if three tools say "54% higher test scores," it must be real. Our research shows the opposite. Some of the most widely repeated stats have no traceable origin at all. Convergence in AI outputs often reflects shared training data, not shared evidence.

Best AI Presentation Tool for Accuracy: Our Own Results

We need to address the elephant in the room. LayerProof scored 43%. That's second place, but it's not good. More than half of our claims were distorted, fabricated, or unverifiable.

We're not going to spin that. A 43% accuracy rate means there's significant room for improvement in the factual grounding of our AI outputs.

But here's what we will say: we shipped every one of those claims with a source link. The 13 distortions in our output? Each one came with a URL that, when clicked, would show you the real number. The gap between what our tool said and what the source actually said was visible to anyone who checked.

That's the difference we're focused on. Not perfection. Transparency.

Kimi beat us by a percentage point on raw accuracy. Fair enough. But Kimi's 3 distorted claims came with no way to check them. Our 13 distorted claims each came with a trail you could follow. One approach trusts the machine. The other trusts you.

We think accuracy will improve across all tools as the underlying models get better. The question is what happens in the meantime. And in the meantime, the best defense against hallucination isn't a smarter AI. It's a source link.

Where We Go From Here

The AI presentation space is moving fast. New tools launch regularly. Existing tools update their models. The numbers in this article will age.

But the underlying pattern is unlikely to change soon. AI models synthesize information from training data. That data contains errors, zombie stats, and conflicting figures. Until models can reliably access and cite primary sources in real time, every AI-produced presentation will require human verification.

The tools that acknowledge this, that build verification into the workflow rather than hiding it behind polished slides, will be the ones worth using. The tools that present fabricated statistics on well-designed slides are not just unhelpful. They're actively risky.

We ran this test because we wanted to know the truth about our own tool and about the market we're in. The truth is sobering across the board. No tool is reliable enough to present without checking. The difference is whether a tool makes checking possible or leaves you guessing.

The best defense against AI hallucination in presentations isn't a smarter model. It's a source link you can click.

Check your presentations. Check ours too. That's the whole point.


Frequently Asked Questions

How accurate are AI presentation makers?

Accuracy ranged from 0% to 44% across the six tools we tested. No tool verified more than half its claims. Most scored between 17-20%.

Do AI presentation tools hallucinate?

Yes. Every tool we tested produced inaccurate content, from distorted statistics to fully fabricated case studies with real company names.

Which AI presentation maker is most accurate?

Kimi scored 44% (4/9 claims verified). LayerProof scored 43% (13/30 claims) but was the only tool providing source citations, letting users verify claims independently.

Can you trust statistics in AI-generated presentations?

Not without checking. Statistics were the least reliable claim type across all tools. Named examples and policy references were far more accurate than numerical claims.

What are zombie stats in AI presentations?

Statistics with no traceable primary source that circulate widely online and get absorbed into AI training data. We found multiple examples, including "54% higher test scores," appearing across tools with zero original evidence behind them.

How do you fact-check an AI-generated presentation?

Verify every statistic against its primary source. Watch for real institution names paired with fake data. Be skeptical of round percentages. Treat any unsourced number as unverified. Use tools that provide citations so you have something to check against.


Methodology note: This test was conducted in February 2026. Each tool was given the identical prompt one time with default settings. Claims were verified using AI-assisted research against primary sources including government publications, peer-reviewed research, official company reports, and established market research firms. All classifications were reviewed and validated by hand. The full dataset, including every claim and its classification, is available on request.

Want to get on the
LayerProof waitlist early?

Contact us