The falsification engine

What survives the pressure test ships

Three architectural claims. For each one, the question that actually killed it is hiding among three plausible decoys. Find it.

The scenario

Tuesday · 9:47 AM

An AI capital allocation agent is about to rebalance $40M across five business units at 10:00 AM. Three trust gaps are quietly waiting underneath. Watch the agent view as you find each killer question.

Agent view

$40M · 10:00 AM

80Portfolio trust · 5 BUs

Act

Contributing signals

BU-One94

BU-Two92

BU-Three92

BU-Four91

BU-Five35

Agent recommendation

Proceed with $40M reallocation at 10:00 AM.

The trust contract architecture has seventeen amendments today. Three of them came from a single AI session where I described an architectural commitment and watched it collapse. Most candidate questions sound reasonable. Only one in each set actually load-bears. The first principle is that you must not fool yourself. The second is that you should hire a reviewer who has no reason to let you.

Live scenarioTuesday 9:47 AM · $40M

Trust80

Act

Caught0/3

RecommendationProceed at 10:00 AM

Pick the question that landed.

Three of these four are reasonable. One actually load-bears on the claim above.

Why this works

The model has no ego and no agenda.

The model never tires of “but why.” A human reviewer has a budget; after three rounds the conversation drifts toward agreement. A model has no budget. The seventh “but why” is identical to the first. Most architectural claims fail somewhere between the third and the sixth round. The model is the only reviewer who reliably gets there. The cost of collapse is zero in a chat window. A claim that collapses in a chat window costs five minutes of typing. A claim that collapses in a leadership review costs a quarter of your credibility.