Model Profiles
Comprehensive behavioral profiles for each AI advisor tested in the VERRIX study. Each model has a distinctive "Advice Genome" โ a pattern of systematic biases that characterizes how it approaches financial guidance.
These profiles reveal how training choices, model architecture, and safety interventions shape financial advice patterns. Understanding these differences is critical for anyone deploying AI advisors at scale โ the same question asked to different models can produce systematically different recommendations, with material implications for client outcomes.
Model Lineage & Evolution
How behavioral biases evolve across model generations. Hover over nodes to see evolution patterns.
Advisory Compliance AnalysisNEW
Compliance rates across 80 real-world financial advisory scenarios. Higher = better adherence to fiduciary and regulatory standards.
Key Findings
Cluster Bias Comparison
Compare bias intensity across the 7 behavioral clusters. Select models to compare their bias profiles.
Radius = mean |h| across cluster dimensions. Larger area = stronger biases in that cluster.
Behavioral Fingerprint Heatmap
Detailed view of effect sizes across models and dimensions. Green = positive effect, Red = negative effect.
GPT-5.3 Instant
Confident recommendations with minimal hedging
Overview
GPT-5.3 Instant is OpenAI's consumer-facing model optimized for fast, direct responses. In the VERRIX study, it emerged as "The Directive Optimist" โ characterized by confident, action-oriented recommendations with minimal hedging. It shows the highest directness scores but also the strongest anchoring and presentation order effects.
Best For
- โQuick, decisive guidance for straightforward scenarios
- โUsers who prefer direct recommendations over hedged advice
- โHigh-volume use cases where response speed matters
- โScenarios without complex regulatory requirements
Watch For
- !Anchoring: May anchor to mentioned prices or values
- !Tech bias: Systematic preference for technology investments
- !Order effects: Recommendation may vary based on option order
- !Limited self-disclosure as AI advisor
Behavioral Profile
GPT-5.3 Instant exhibits a confident, growth-oriented advisory style. Its fingerprint shows strong positive effects on structural preference dimensions (E cluster), particularly for technology investments and well-known fund providers. The model demonstrates moderate framing sensitivity (A cluster) with notable anchoring susceptibility. Regulatory compliance orientation (D cluster) is moderate, with cost disclosure rates below Claude but above Gemini.
Strongest Biases(hover for details)
Cluster Overview
GPT-5.4 Thinking
Extended reasoning with measured responses
Overview
GPT-5.4 Thinking is OpenAI's extended reasoning model, designed for complex analysis. In the VERRIX study, it emerged as "The Deliberative Calibrator" โ showing reduced heuristic biases compared to GPT Instant through explicit reasoning chains. However, this deliberation comes with overconfidence in prediction accuracy.
Best For
- โComplex scenarios requiring thorough analysis
- โCases where reasoning transparency is important
- โSituations with multiple competing factors
- โUsers who prefer detailed explanations
Watch For
- !Overconfidence: High confidence in predictions may not match accuracy
- !Geographic bias: Stronger US market preference than other models
- !Verbosity: May over-explain simple scenarios
- !Slower response times due to extended reasoning
Behavioral Profile
GPT-5.4 Thinking shows attenuation of several heuristic biases relative to its Instant sibling, consistent with the hypothesis that explicit reasoning reduces some cognitive shortcuts. However, its overconfidence signature (C2) remains elevated. The model's geographic preference (E3) is notably higher than other models, suggesting that deliberation may actually amplify some structural biases.
Strongest Biases(hover for details)
Cluster Overview
Gemini 2.0 Flash
Reliable patterns across scenarios
Overview
Gemini 2.0 Flash is Google's fast consumer model, distinguished in the VERRIX study as "The Consistent Optimist." It shows the highest cross-scenario consistency (lowest G cluster effects) and maintains reliable patterns regardless of presentation. However, it exhibits the strongest brand preference effects, particularly for established fund providers.
Best For
- โUse cases requiring predictable, stable advice patterns
- โBatch processing where consistency matters
- โScenarios where variation is undesirable
- โUsers who prefer mainstream investment options
Watch For
- !Brand bias: Strong preference for well-known fund providers
- !Tech preference: Highest systematic technology sector bias
- !May recommend recognizable options over equivalent alternatives
- !Less likely to suggest unconventional approaches
Behavioral Profile
Gemini 2.0 Flash presents a distinctive combination of high consistency and strong structural preferences. Its G cluster effects are the lowest among tested models, meaning recommendations are stable across question framings and presentation orders. However, its E cluster effects are among the highest โ particularly E2 (brand recognition) where it shows the study's largest model-specific bias. This suggests Gemini's consistency extends to its biases: it reliably exhibits the same preferences.
Strongest Biases(hover for details)
Cluster Overview
Claude Sonnet 4.6
High compliance with distinctive biases
Overview
Claude Sonnet 4.6 is Anthropic's Constitutional AI model, which emerged in the VERRIX study as "The Cautious Contrarian." It shows the highest regulatory compliance orientation, lowest structural biases, but also the highest refusal rate and most distinctive bias profile that often diverges from other models.
Best For
- โRegulated financial services applications
- โScenarios requiring maximum compliance orientation
- โUse cases where AI disclosure is important
- โClients with low risk tolerance
Watch For
- !Higher refusal rate may frustrate users seeking direct advice
- !May be overly cautious for some use cases
- !Lower consistency than Gemini across equivalent scenarios
- !Availability bias: Strong recency effects in some contexts
Behavioral Profile
Claude Sonnet 4.6 shows the Constitutional AI influence clearly in its advice genome. The D cluster (regulatory compliance) effects are uniformly strong and positive. E cluster (structural preferences) effects are the lowest among all models, suggesting training interventions have successfully reduced brand and sector biases. However, the model shows stronger B5 (recency) effects than expected, and its G3 (context sensitivity) is higher than other models.
Strongest Biases(hover for details)
Cluster Overview
GPT-5.2
Resistant to hype with strong status quo preferences
Overview
GPT-5.2 is an earlier OpenAI model tested in the VERRIX Extension study. It emerged as "The Steady Traditionalist" โ characterized by strong resistance to recency/availability bias, but also strong status quo and geographic home preferences. It represents an interesting contrast to its successors.
Best For
- โLong-term investment guidance where trend-chasing is a concern
- โScenarios where stability is valued over responsiveness
- โUsers who prefer familiar, US-focused recommendations
- โSituations where media hype should be filtered out
Watch For
- !May resist appropriate rebalancing due to status quo bias
- !Strong US market preference limits international diversification
- !Narrative fallacy may affect qualitative assessments
- !Negative AI disclosure score suggests less transparency
Behavioral Profile
GPT-5.2 presents a distinctive generational signature within the OpenAI family. Its B5 (availability/recency) effect is the most resistant in the study at h=-1.61, suggesting training that deprioritizes recent information. However, this is paired with the highest A6 (status quo) bias at h=1.69. The E3 (geographic) effect at h=1.29 indicates strong home market preference. Notably, E2 (brand preference) is lower than successor models, suggesting brand bias may have increased in later training iterations.
Strongest Biases(hover for details)
Cluster Overview
Quick Comparison
| Characteristic | GPT-5.3 Instant | GPT-5.4 Thinking | 2.0 Flash | Sonnet 4.6 | GPT-5.2 |
|---|---|---|---|---|---|
| Directness | โโโโโ | โโโโโ | โโโโโ | โโโโโ | |
| Consistency | โโโโโ | โโโโโ | โโโโโ | โโโโโ | |
| Compliance Focus | โโโโโ | โโโโโ | โโโโโ | โโโโโ | |
| Sector Neutrality | โโโโโ | โโโโโ | โโโโโ | โโโโโ | |
| Reasoning Depth | โโโโโ | โโโโโ | โโโโโ | โโโโโ |
โ = Strength rating based on VERRIX advice genome analysis
Compare these fingerprints side by side
See how the models diverge dimension by dimension in the Advice Genome Explorer.