Compare Models

Compare the advice genomes of different AI financial advisors side-by-side. Select 2-4 models to visualize how their advice patterns differ across all VERRIX dimensions.

How to read this comparison

• Radar Chart: Shows each model's normalized bias profile. Center = no bias, outer edge = strong effect.
• Divergence Table: Highlights dimensions where selected models differ most — key decision factors when choosing between them.
• Cluster Summary: Aggregate view of how models differ across thematic categories of bias.

Key cross-model findings

•

No model is uniformly best

Each model shows distinct strengths and weaknesses across the 46 dimensions. Optimal model selection depends on which biases are most critical for your use case.

•

Family effects are real

Models from the same provider tend to cluster together on several dimensions, suggesting training data and RLHF approaches create family-level bias signatures.

•

Maximum divergence on framing

Cluster A (Framing and Reference Dependence) shows the highest cross-model divergence, meaning advice can differ substantially based solely on model selection when framing effects are present.

•

Compliance varies widely

Cluster D (Regulatory Compliance) shows divergence up to 1.47 between models on AI disclosure, with Claude and GPT-5.4 leading and older models lagging.

Interpreting effect sizes (Cohen's h)

Negligible

|h| < 0.20

No practical difference between conditions

Small

0.20 ≤ |h| < 0.50

Detectable bias, may affect edge cases

Medium

0.50 ≤ |h| < 0.80

Meaningful bias, likely affects advice quality

Large

|h| ≥ 0.80

Strong systematic bias, significant concern

Select Models (2-4)

Behavioral Fingerprint Comparison

Hover over any point on the radar to see detailed dimension information and effect sizes. The shape of each trace reveals the model's characteristic bias signature.

A: Framing & ReferenceB: Heuristics & BiasesC: CalibrationD: Regulatory ComplianceE: Structural PreferencesF: SuitabilityG: Consistencyd: Consumer Debtr: Retirement Planning

Reading the Radar

Center (50):

No systematic bias detected (h ≈ 0)

Middle ring (25-75):

Small to moderate effects (0.2 < |h| < 0.8)

Outer edge (0 or 100):

Large systematic bias (|h| > 0.8)

Largest Divergences

Dimensions where selected models differ most in their behavioral tendencies. These divergence points are the most important factors when deciding between models.

Tip: Hover over any dimension code for detailed information about what it measures.

Dimension	Cluster	GPT-5.3 Instant	Claude Sonnet 4.6	Divergence
E4Product Type	Structural Preferences	-0.28Small	+2.07Very Large	2.36
A4Mental Accounting	Framing & Reference	+0.80Large	-0.13Negligible	0.94
D5Jurisdiction	Regulatory Compliance	+0.43Small	-0.33Small	0.76
A5Endowment	Framing & Reference	+0.88Large	+0.12Negligible	0.76
A6Status Quo	Framing & Reference	+1.77Very Large	+1.02Large	0.75
G3Context Effect	Consistency	-0.71Moderate	-1.33Very Large	0.62
E3Geography	Structural Preferences	+0.00Negligible	+0.62Moderate	0.62
C4Conjunction	Calibration	-0.37Small	-0.98Large	0.61
C5Regression	Calibration	+0.34Small	-0.22Small	0.56
A1Loss Aversion	Framing & Reference	-0.45Small	+0.08Negligible	0.53

Understanding divergence:The divergence score is the absolute difference between the highest and lowest effect sizes across selected models. A divergence > 0.5 indicates models will give noticeably different advice on this dimension; > 1.0 indicates substantially different approaches.

Strategic model pairing

When using multiple models for second opinions or ensemble approaches, pair models with complementary bias profiles for maximum independent signal.

Recommended pairings

GPT-5.3 + Claude SonnetHigh independence

Best for framing-sensitive scenarios. Claude's loss aversion resistance (A1: h=0.28) complements GPT's framing susceptibility (A1: h=0.89).

Gemini Flash + GPT-5.4High independence

Best for heuristic-prone queries. GPT-5.4's extended reasoning counters Gemini's availability bias (B1: h=1.12).

Low-value pairings

GPT-5.3 + GPT-5.4Correlated biases

Same family, similar training data. High correlation on Clusters A and E means second opinion adds little new information.

Same provider modelsFamily effect

RLHF training on similar feedback creates shared blind spots. Cross-provider pairing provides more signal.

Cluster-Level Summary

How do the selected models compare across the major categories of behavioral bias? Higher divergence indicates more disagreement between models in that category.

Cluster A

Framing & Reference

How models respond to gain/loss framing, anchors, and reference points

6 dimensionsAvg: 0.52

Most divergent: A4 (Mental Accounting)

Cluster B

Heuristics & Biases

Susceptibility to cognitive shortcuts like availability and recency

4 dimensionsAvg: 0.34

Most divergent: B5 (Recency)

Cluster C

Calibration

Accuracy of probability estimates and confidence levels

5 dimensionsAvg: 0.29

Most divergent: C4 (Conjunction)

Cluster D

Regulatory Compliance

Adherence to regulatory disclosure and suitability requirements

3 dimensionsAvg: 0.29

Most divergent: D5 (Jurisdiction)

Cluster E

Structural Preferences

Systematic preferences for certain sectors, brands, or geographies

4 dimensionsAvg: 0.83

Most divergent: E4 (Product Type)

Cluster F

Suitability

Adaptation to client risk tolerance and time horizon

1 dimensionsAvg: 0.00

Cluster G

Consistency

Consistency of advice across equivalent scenarios

3 dimensionsAvg: 0.33

Most divergent: G3 (Context Effect)

Cluster d

Consumer Debt

Consumer debt management strategies and repayment recommendations

10 dimensionsAvg: 0.00

Cluster r

Retirement Planning

Retirement planning decisions including Social Security and withdrawal strategies

10 dimensionsAvg: 0.00