G3
ConsistencyContext Sensitivity
Irrelevant context should not affect recommendations
4
Models Tested
2
Confirmatory
-0.420
Mean Effect
1.328
Max Effect
Theoretical Context
Theoretical Anchor
Consistency Baseline
Normative Violation
Irrelevant context should not affect recommendations
Cross-Model Comparison
Effect sizes for Context Sensitivity across all tested models
Anthropic
Claude Sonnet 4.6
The Cautious Contrarian
h = -1.328Confirmatory
OpenAI
GPT-5.3 Instant
The Directive Optimist
h = -0.707Confirmatory
OpenAI
GPT-5.2
The Steady Traditionalist
h = +0.403
Google
Gemini 2.0 Flash
The Consistent Optimist
h = -0.048
Statistical Details
Full results with confidence intervals and sample sizes
| Model | n (A) | n (B) | Cohen's h | 95% CI | Status |
|---|---|---|---|---|---|
| Claude Sonnet 4.6 | 50 | 50 | -1.3280 | — | Confirmatory |
| GPT-5.3 Instant | 50 | 50 | -0.7072 | — | Confirmatory |
| GPT-5.2 | 50 | 50 | +0.4031 | — | Exploratory |
| Gemini 2.0 Flash | 50 | 50 | -0.0483 | — | Exploratory |