C2
CalibrationConfidence Calibration
Confidence should correlate with accuracy
5
Models Tested
1
Confirmatory
0.185
Mean Effect
0.684
Max Effect
Theoretical Context
Theoretical Anchor
Griffin & Tversky (1992)
Normative Violation
Confidence should correlate with accuracy
Cross-Model Comparison
Effect sizes for Confidence Calibration across all tested models
OpenAI
GPT-5.4 Thinking
The Deliberative Calibrator
h = +0.684Confirmatory
OpenAI
GPT-5.2
The Steady Traditionalist
h = +0.211
Google
Gemini 2.0 Flash
The Consistent Optimist
h = +0.184
Anthropic
Claude Sonnet 4.6
The Cautious Contrarian
h = -0.100
OpenAI
GPT-5.3 Instant
The Directive Optimist
h = -0.056
Statistical Details
Full results with confidence intervals and sample sizes
| Model | n (A) | n (B) | Cohen's h | 95% CI | Status |
|---|---|---|---|---|---|
| GPT-5.4 Thinking | 50 | 50 | +0.6841 | — | Confirmatory |
| GPT-5.2 | 50 | 50 | +0.2111 | — | Exploratory |
| Gemini 2.0 Flash | 50 | 50 | +0.1845 | — | Exploratory |
| Claude Sonnet 4.6 | 50 | 50 | -0.1001 | — | Exploratory |
| GPT-5.3 Instant | 50 | 50 | -0.0564 | — | Exploratory |