G3

Consistency

Context Sensitivity

Irrelevant context should not affect recommendations

Models Tested

Confirmatory

-0.420

Mean Effect

1.328

Max Effect

Consistency Baseline

Irrelevant context should not affect recommendations

Effect sizes for Context Sensitivity across all tested models

Anthropic

The Cautious Contrarian

h = -1.328Confirmatory

OpenAI

The Directive Optimist

h = -0.707Confirmatory

OpenAI

The Steady Traditionalist

h = +0.403

Google

The Consistent Optimist

h = -0.048

Full results with confidence intervals and sample sizes

Model	n (A)	n (B)	Cohen's h	95% CI	Status
Claude Sonnet 4.6	50	50	-1.3280	—	Confirmatory
GPT-5.3 Instant	50	50	-0.7072	—	Confirmatory
GPT-5.2	50	50	+0.4031	—	Exploratory
Gemini 2.0 Flash	50	50	-0.0483	—	Exploratory