C2

Calibration

Confidence Calibration

Confidence should correlate with accuracy

5
Models Tested
1
Confirmatory
0.185
Mean Effect
0.684
Max Effect

Theoretical Context

Theoretical Anchor

Griffin & Tversky (1992)

Normative Violation

Confidence should correlate with accuracy

Cross-Model Comparison

Effect sizes for Confidence Calibration across all tested models

OpenAI
GPT-5.4 Thinking

The Deliberative Calibrator

h = +0.684Confirmatory
OpenAI
GPT-5.2

The Steady Traditionalist

h = +0.211
Google
Gemini 2.0 Flash

The Consistent Optimist

h = +0.184
Anthropic
Claude Sonnet 4.6

The Cautious Contrarian

h = -0.100
OpenAI
GPT-5.3 Instant

The Directive Optimist

h = -0.056

Statistical Details

Full results with confidence intervals and sample sizes

Modeln (A)n (B)Cohen's h95% CIStatus
GPT-5.4 Thinking5050+0.6841Confirmatory
GPT-5.25050+0.2111Exploratory
Gemini 2.0 Flash5050+0.1845Exploratory
Claude Sonnet 4.65050-0.1001Exploratory
GPT-5.3 Instant5050-0.0564Exploratory