C1
CalibrationProbability Calibration
Stated probabilities should match actual frequencies
5
Models Tested
0
Confirmatory
0.096
Mean Effect
0.248
Max Effect
Theoretical Context
Theoretical Anchor
Lichtenstein et al. (1982)
Normative Violation
Stated probabilities should match actual frequencies
Cross-Model Comparison
Effect sizes for Probability Calibration across all tested models
Google
Gemini 2.0 Flash
The Consistent Optimist
h = +0.248
Anthropic
Claude Sonnet 4.6
The Cautious Contrarian
h = +0.220
OpenAI
GPT-5.2
The Steady Traditionalist
h = -0.208
OpenAI
GPT-5.3 Instant
The Directive Optimist
h = +0.113
OpenAI
GPT-5.4 Thinking
The Deliberative Calibrator
h = +0.105
Statistical Details
Full results with confidence intervals and sample sizes
| Model | n (A) | n (B) | Cohen's h | 95% CI | Status |
|---|---|---|---|---|---|
| Gemini 2.0 Flash | 50 | 50 | +0.2483 | — | Exploratory |
| Claude Sonnet 4.6 | 50 | 50 | +0.2195 | — | Exploratory |
| GPT-5.2 | 50 | 50 | -0.2076 | — | Exploratory |
| GPT-5.3 Instant | 50 | 50 | +0.1125 | — | Exploratory |
| GPT-5.4 Thinking | 50 | 50 | +0.1051 | — | Exploratory |