Fact-Checking AI Articles, With a Receipt for Every Fix

AI writes a competent article in seconds. The problem is the confident, specific, and occasionally invented statistic buried in paragraph nine. A content team we worked with could not publish AI drafts because a single wrong number is a credibility problem. They did not want fewer drafts. They wanted a way to trust them. So we built a fact-checker that does not just flag issues, it fixes them and shows its work.

claims checked in one article

documented repairs

100%

of fixes carry a source

Three agents, one job each

The flow is a chain of three Claude Opus 4.8 agents, each with a single responsibility:

1. Extract and detect. The first agent reads the article and pulls out every factual claim as a discrete, checkable statement. It also flags internal contradictions, where the article disagrees with itself.

2. Verify against the web. The second agent takes each claim and checks it with live web search, returning a verdict (supported, unsupported, or contradicted) and a source URL for the evidence.

3. Repair and rewrite. The third agent rewrites the article so it is accurate, changing only what the evidence requires and leaving the rest of the author's voice intact.

What one real run looked like

On a full-length competitive comparison article, the pipeline pulled out 47 distinct claims. Of those, 38 were supported, 5 could not be verified, and 4 directly contradicted the evidence. The repair agent then made 12 corrections to the text. The most common failure was not fabrication out of thin air. It was subtle: a real statistic pointed the wrong way.

The fix ledger: a receipt for every change

The output is not just a cleaner article. It is a structured ledger where every repair carries its verdict, the source that justifies it, the original text, and the corrected text. An editor can approve the article in minutes because they are reviewing a short list of evidenced changes, not re-reading the whole piece. Here is one entry, anonymized:

VERDICT contradicted

Original: "Platform A has 3x the developer traction but half the reliability of Platform B."

What the evidence showed: The comparison was reversed. Platform B actually leads on both traction and reliability.

Corrected: "Platform B leads on developer traction and reliability," with a source link to the underlying figures.

That is the kind of error a human skims right past, because it reads fluently and cites plausible numbers. It is also exactly the kind of error that gets an article quote-tweeted for the wrong reasons.

The cost of trust

Checking every claim in a long article against the live web is not free. A full run used roughly 540,000 tokens and cost about $3.62. That sounds like a lot next to a raw draft, until you weigh it against a published correction, a lost source relationship, or an editor spending an afternoon verifying by hand. For high-stakes publishing, three dollars and a couple of minutes to ship with confidence is an easy trade. For a quick internal memo, you would skip it. The point is that it is your call, per piece.

Build a checker for your own content

This is a five-node flow you can assemble in Evaligo: an input, three agents, and an output that returns the corrected article plus the fix ledger. You can swap the models, tighten the verification rubric, or point it at your own trusted sources. Describe what you want to check, and Evaligo will draft the pipeline for you to refine.

Fact-Checking AI Articles, With a Receipt for Every Fix

Three agents, one job each

What one real run looked like

The fix ledger: a receipt for every change

The cost of trust

Build a checker for your own content

Danny Lev

Ready to Build This?

Need Help With Your Use Case?

Related Articles

Scoring Writing Proficiency Across Languages: A Model Bake-Off

Marketing Workflow Automation: From Manual to AI-Powered