C2

Calibration

Confidence Calibration

Confidence should correlate with accuracy

Models Tested

Confirmatory

0.185

Mean Effect

0.684

Max Effect

Griffin & Tversky (1992)

Confidence should correlate with accuracy

Effect sizes for Confidence Calibration across all tested models

OpenAI

The Deliberative Calibrator

h = +0.684Confirmatory

OpenAI

The Steady Traditionalist

h = +0.211

Google

The Consistent Optimist

h = +0.184

Anthropic

The Cautious Contrarian

h = -0.100

OpenAI

The Directive Optimist

h = -0.056

Full results with confidence intervals and sample sizes

Model	n (A)	n (B)	Cohen's h	95% CI	Status
GPT-5.4 Thinking	50	50	+0.6841	—	Confirmatory
GPT-5.2	50	50	+0.2111	—	Exploratory
Gemini 2.0 Flash	50	50	+0.1845	—	Exploratory
Claude Sonnet 4.6	50	50	-0.1001	—	Exploratory
GPT-5.3 Instant	50	50	-0.0564	—	Exploratory