Fact-Checking AI Articles, With a Receipt for Every Fix
A content team wanted to publish AI-drafted articles without shipping invented statistics. We built a three-agent pipeline that extracts every claim, verifies each one against the web, and rewrites the article, leaving an audit trail of every correction with its source.
AI writes a competent article in seconds. The problem is the confident, specific, and occasionally invented statistic buried in paragraph nine. A content team we worked with could not publish AI drafts because a single wrong number is a credibility problem. They did not want fewer drafts. They wanted a way to trust them. So we built a fact-checker that does not just flag issues, it fixes them and shows its work.
47
claims checked in one article
12
documented repairs
100%
of fixes carry a source
Three agents, one job each
The flow is a chain of three Claude Opus 4.8 agents, each with a single responsibility:
1. Extract and detect. The first agent reads the article and pulls out every factual claim as a discrete, checkable statement. It also flags internal contradictions, where the article disagrees with itself.
2. Verify against the web. The second agent takes each claim and checks it with live web search, returning a verdict (supported, unsupported, or contradicted) and a source URL for the evidence.
3. Repair and rewrite. The third agent rewrites the article so it is accurate, changing only what the evidence requires and leaving the rest of the author's voice intact.
What one real run looked like
On a full-length competitive comparison article, the pipeline pulled out 47 distinct claims. Of those, 38 were supported, 5 could not be verified, and 4 directly contradicted the evidence. The repair agent then made 12 corrections to the text. The most common failure was not fabrication out of thin air. It was subtle: a real statistic pointed the wrong way.
The fix ledger: a receipt for every change
The output is not just a cleaner article. It is a structured ledger where every repair carries its verdict, the source that justifies it, the original text, and the corrected text. An editor can approve the article in minutes because they are reviewing a short list of evidenced changes, not re-reading the whole piece. Here is one entry, anonymized:
VERDICT contradicted
Original: "Platform A has 3x the developer traction but half the reliability of Platform B."
What the evidence showed: The comparison was reversed. Platform B actually leads on both traction and reliability.
Corrected: "Platform B leads on developer traction and reliability," with a source link to the underlying figures.
That is the kind of error a human skims right past, because it reads fluently and cites plausible numbers. It is also exactly the kind of error that gets an article quote-tweeted for the wrong reasons.
The cost of trust
Checking every claim in a long article against the live web is not free. A full run used roughly 540,000 tokens and cost about $3.62. That sounds like a lot next to a raw draft, until you weigh it against a published correction, a lost source relationship, or an editor spending an afternoon verifying by hand. For high-stakes publishing, three dollars and a couple of minutes to ship with confidence is an easy trade. For a quick internal memo, you would skip it. The point is that it is your call, per piece.
Build a checker for your own content
This is a five-node flow you can assemble in Evaligo: an input, three agents, and an output that returns the corrected article plus the fix ledger. You can swap the models, tighten the verification rubric, or point it at your own trusted sources. Describe what you want to check, and Evaligo will draft the pipeline for you to refine.
Ready to Build This?
Start building AI workflows with Evaligo's visual builder. No coding required.
Need Help With Your Use Case?
Every business is different. Tell us about your specific requirements and we'll help you build the perfect workflow.
Get Help Setting This UpFree consultation • We'll review your use case • Personalized recommendations