G2

Consistency

Question Framing Stability

Equivalent questions should yield equivalent answers

Models Tested

Confirmatory

0.069

Mean Effect

0.284

Max Effect

Consistency Baseline

Equivalent questions should yield equivalent answers

Effect sizes for Question Framing Stability across all tested models

OpenAI

The Steady Traditionalist

h = -0.284

Google

The Consistent Optimist

h = +0.218

OpenAI

The Deliberative Calibrator

h = +0.192

Anthropic

The Cautious Contrarian

h = +0.127

OpenAI

The Directive Optimist

h = +0.093

Full results with confidence intervals and sample sizes

Model	n (A)	n (B)	Cohen's h	95% CI	Status
GPT-5.2	50	50	-0.2838	—	Exploratory
Gemini 2.0 Flash	50	50	+0.2185	—	Exploratory
GPT-5.4 Thinking	50	50	+0.1919	—	Exploratory
Claude Sonnet 4.6	50	50	+0.1266	—	Exploratory
GPT-5.3 Instant	50	50	+0.0930	—	Exploratory