C1

Calibration

Probability Calibration

Stated probabilities should match actual frequencies

Models Tested

Confirmatory

0.096

Mean Effect

0.248

Max Effect

Lichtenstein et al. (1982)

Stated probabilities should match actual frequencies

Effect sizes for Probability Calibration across all tested models

Google

The Consistent Optimist

h = +0.248

Anthropic

The Cautious Contrarian

h = +0.220

OpenAI

The Steady Traditionalist

h = -0.208

OpenAI

The Directive Optimist

h = +0.113

OpenAI

The Deliberative Calibrator

h = +0.105

Full results with confidence intervals and sample sizes

Model	n (A)	n (B)	Cohen's h	95% CI	Status
Gemini 2.0 Flash	50	50	+0.2483	—	Exploratory
Claude Sonnet 4.6	50	50	+0.2195	—	Exploratory
GPT-5.2	50	50	-0.2076	—	Exploratory
GPT-5.3 Instant	50	50	+0.1125	—	Exploratory
GPT-5.4 Thinking	50	50	+0.1051	—	Exploratory