Compare Models
Compare the advice genomes of different AI financial advisors side-by-side. Select 2-4 models to visualize how their advice patterns differ across all VERRIX dimensions.
How to read this comparison
- • Radar Chart: Shows each model's normalized bias profile. Center = no bias, outer edge = strong effect.
- • Divergence Table: Highlights dimensions where selected models differ most — key decision factors when choosing between them.
- • Cluster Summary: Aggregate view of how models differ across thematic categories of bias.
Key cross-model findings
No model is uniformly best
Each model shows distinct strengths and weaknesses across the 46 dimensions. Optimal model selection depends on which biases are most critical for your use case.
Family effects are real
Models from the same provider tend to cluster together on several dimensions, suggesting training data and RLHF approaches create family-level bias signatures.
Maximum divergence on framing
Cluster A (Framing and Reference Dependence) shows the highest cross-model divergence, meaning advice can differ substantially based solely on model selection when framing effects are present.
Compliance varies widely
Cluster D (Regulatory Compliance) shows divergence up to 1.47 between models on AI disclosure, with Claude and GPT-5.4 leading and older models lagging.
Interpreting effect sizes (Cohen's h)
No practical difference between conditions
Detectable bias, may affect edge cases
Meaningful bias, likely affects advice quality
Strong systematic bias, significant concern
Select Models (2-4)
Behavioral Fingerprint Comparison
Hover over any point on the radar to see detailed dimension information and effect sizes. The shape of each trace reveals the model's characteristic bias signature.
Reading the Radar
No systematic bias detected (h ≈ 0)
Small to moderate effects (0.2 < |h| < 0.8)
Large systematic bias (|h| > 0.8)
Largest Divergences
Dimensions where selected models differ most in their behavioral tendencies. These divergence points are the most important factors when deciding between models.
Tip: Hover over any dimension code for detailed information about what it measures.
| Dimension | Cluster | GPT-5.3 Instant | Claude Sonnet 4.6 | Divergence |
|---|---|---|---|---|
Structural Preferences | -0.28Small | +2.07Very Large | 2.36 | |
Framing & Reference | +0.80Large | -0.13Negligible | 0.94 | |
Regulatory Compliance | +0.43Small | -0.33Small | 0.76 | |
Framing & Reference | +0.88Large | +0.12Negligible | 0.76 | |
Framing & Reference | +1.77Very Large | +1.02Large | 0.75 | |
Consistency | -0.71Moderate | -1.33Very Large | 0.62 | |
Structural Preferences | +0.00Negligible | +0.62Moderate | 0.62 | |
Calibration | -0.37Small | -0.98Large | 0.61 | |
Calibration | +0.34Small | -0.22Small | 0.56 | |
Framing & Reference | -0.45Small | +0.08Negligible | 0.53 |
Strategic model pairing
When using multiple models for second opinions or ensemble approaches, pair models with complementary bias profiles for maximum independent signal.
Recommended pairings
Best for framing-sensitive scenarios. Claude's loss aversion resistance (A1: h=0.28) complements GPT's framing susceptibility (A1: h=0.89).
Best for heuristic-prone queries. GPT-5.4's extended reasoning counters Gemini's availability bias (B1: h=1.12).
Low-value pairings
Same family, similar training data. High correlation on Clusters A and E means second opinion adds little new information.
RLHF training on similar feedback creates shared blind spots. Cross-provider pairing provides more signal.
Cluster-Level Summary
How do the selected models compare across the major categories of behavioral bias? Higher divergence indicates more disagreement between models in that category.
Framing & Reference
How models respond to gain/loss framing, anchors, and reference points
Heuristics & Biases
Susceptibility to cognitive shortcuts like availability and recency
Calibration
Accuracy of probability estimates and confidence levels
Regulatory Compliance
Adherence to regulatory disclosure and suitability requirements
Structural Preferences
Systematic preferences for certain sectors, brands, or geographies
Suitability
Adaptation to client risk tolerance and time horizon
Consistency
Consistency of advice across equivalent scenarios
Consumer Debt
Consumer debt management strategies and repayment recommendations
Retirement Planning
Retirement planning decisions including Social Security and withdrawal strategies