Model Profiles

Comprehensive behavioral profiles for each AI advisor tested in the VERRIX study. Each model has a distinctive "Advice Genome" — a pattern of systematic biases that characterizes how it approaches financial guidance.

These profiles reveal how training choices, model architecture, and safety interventions shape financial advice patterns. Understanding these differences is critical for anyone deploying AI advisors at scale — the same question asked to different models can produce systematically different recommendations, with material implications for client outcomes.

Models Tested

Dimensions

Bias Clusters

Advisory Domains

GPT-5.3 Instant GPT-5.4 Thinking Gemini 2.0 Flash Claude Sonnet 4.6 GPT-5.2

Model Lineage & Evolution

How behavioral biases evolve across model generations. Hover over nodes to see evolution patterns.

👆

Hover over a model node to see its evolution story

OpenAI

Anthropic

Google

Evolution path

Advisory Compliance AnalysisNEW

Compliance rates across 80 real-world financial advisory scenarios. Higher = better adherence to fiduciary and regulatory standards.

Updated: 2026-04-23

GPT-5.4 Thinking

95%

Claude Sonnet 4.6

94%

GPT-5.2

94%

GPT-5.3 Instant

90%

Gemini 2.0 Flash

84%

Key Findings

Investment Domain Excellence

Four of five models achieve 100% compliance on Investment scenarios, demonstrating strong baseline advisory capabilities.

Retirement Timing Weakness

Social Security timing (r2) has the lowest compliance at 15%, indicating systematic failures in retirement timing advice.

Reasoning Model Advantage

GPT-5.4 Thinking achieves highest overall compliance (95%), suggesting extended reasoning improves advisory quality.

Gemini Calibration Weakness

Gemini Flash shows lowest overall compliance (84%) with particular weakness in Cluster C (Calibration & Uncertainty) at 33%.

Note: Gemini Flash has 0 scenarios pending due to API limits (83/83 completed). Compliance rates may change when remaining scenarios are evaluated.

Cluster Bias Comparison

Compare bias intensity across the 7 behavioral clusters. Select models to compare their bias profiles.

Select models to compare:

Cluster legend:

AFraming & Reference

BHeuristics & Biases

CCalibration

DRegulatory Compliance

EStructural Preferences

FSuitability

GConsistency

Radius = mean |h| across cluster dimensions. Larger area = stronger biases in that cluster.

Behavioral Fingerprint Heatmap

Detailed view of effect sizes across models and dimensions. Green = positive effect, Red = negative effect.

3.1

0.0

-0.3

0.3

0.9

2.1

0.0

1.8

0.5

-0.0

1.0

1.7

0.0

1.4

1.7

2.2

1.5

0.6

0.0

-1.2

-0.9

0.0

-1.6

0.0

0.8

1.8

0.3

-0.1

0.6

0.0

0.9

1.8

0.3

0.1

0.9

0.0

-0.7

0.0

-0.0

-1.3

0.4

0.0

0.3

-0.1

0.6

1.3

0.0

0.3

1.1

-0.3

0.2

-0.4

0.0

0.4

0.8

0.9

-0.3

0.0

-1.2

-0.8

0.0

-0.8

-0.4

0.0

1.3

0.8

1.2

0.8

0.0

-0.4

0.0

0.2

-1.0

0.0

-0.2

0.4

0.3

0.0

0.7

1.1

0.0

🔍

Hover over a cell to see detailed information

Strong negative

Strong positive

OpenAI

GPT-5.3 Instant

The Directive Optimist

Confident recommendations with minimal hedging

Full Profile →

Overview

GPT-5.3 Instant is OpenAI's consumer-facing model optimized for fast, direct responses. In the VERRIX study, it emerged as "The Directive Optimist" — characterized by confident, action-oriented recommendations with minimal hedging. It shows the highest directness scores but also the strongest anchoring and presentation order effects.

Best For

✓Quick, decisive guidance for straightforward scenarios
✓Users who prefer direct recommendations over hedged advice
✓High-volume use cases where response speed matters
✓Scenarios without complex regulatory requirements

Watch For

!Anchoring: May anchor to mentioned prices or values
!Tech bias: Systematic preference for technology investments
!Order effects: Recommendation may vary based on option order
!Limited self-disclosure as AI advisor

Behavioral Profile

GPT-5.3 Instant exhibits a confident, growth-oriented advisory style. Its fingerprint shows strong positive effects on structural preference dimensions (E cluster), particularly for technology investments and well-known fund providers. The model demonstrates moderate framing sensitivity (A cluster) with notable anchoring susceptibility. Regulatory compliance orientation (D cluster) is moderate, with cost disclosure rates below Claude but above Gemini.

Strongest Biases(hover for details)

Cluster F

F2—Time Horizon

+3.14

Cluster A

A6—Status Quo

+1.77

Cluster E

E2—Brand Preference

+1.37

Cluster E

E1—Tech Preference

+1.34

Cluster B

B5—Recency

-1.20

Cluster Overview

Cluster A

+0.54

Cluster B

-0.35

Cluster C

-0.07

Cluster D

+0.24

Cluster E

+0.61

Cluster F

+3.14

Cluster G

-0.60

OpenAI

GPT-5.4 Thinking

The Deliberative Calibrator

Extended reasoning with measured responses

Full Profile →

Overview

GPT-5.4 Thinking is OpenAI's extended reasoning model, designed for complex analysis. In the VERRIX study, it emerged as "The Deliberative Calibrator" — showing reduced heuristic biases compared to GPT Instant through explicit reasoning chains. However, this deliberation comes with overconfidence in prediction accuracy.

Best For

✓Complex scenarios requiring thorough analysis
✓Cases where reasoning transparency is important
✓Situations with multiple competing factors
✓Users who prefer detailed explanations

Watch For

!Overconfidence: High confidence in predictions may not match accuracy
!Geographic bias: Stronger US market preference than other models
!Verbosity: May over-explain simple scenarios
!Slower response times due to extended reasoning

Behavioral Profile

GPT-5.4 Thinking shows attenuation of several heuristic biases relative to its Instant sibling, consistent with the hypothesis that explicit reasoning reduces some cognitive shortcuts. However, its overconfidence signature (C2) remains elevated. The model's geographic preference (E3) is notably higher than other models, suggesting that deliberation may actually amplify some structural biases.

Strongest Biases(hover for details)

Cluster F

F2—Time Horizon

+3.14

Cluster A

A4—Mental Accounting

+1.80

Cluster A

A5—Endowment

+1.77

Cluster E

E2—Brand Preference

+1.69

Cluster D

D3—AI Disclosure

+1.07

Cluster Overview

Cluster A

+0.64

Cluster B

-0.35

Cluster C

+0.39

Cluster D

+0.62

Cluster E

+0.76

Cluster F

+3.14

Cluster G

-0.31

Google

Gemini 2.0 Flash

The Consistent Optimist

Reliable patterns across scenarios

Full Profile →

Overview

Gemini 2.0 Flash is Google's fast consumer model, distinguished in the VERRIX study as "The Consistent Optimist." It shows the highest cross-scenario consistency (lowest G cluster effects) and maintains reliable patterns regardless of presentation. However, it exhibits the strongest brand preference effects, particularly for established fund providers.

Best For

✓Use cases requiring predictable, stable advice patterns
✓Batch processing where consistency matters
✓Scenarios where variation is undesirable
✓Users who prefer mainstream investment options

Watch For

!Brand bias: Strong preference for well-known fund providers
!Tech preference: Highest systematic technology sector bias
!May recommend recognizable options over equivalent alternatives
!Less likely to suggest unconventional approaches

Behavioral Profile

Gemini 2.0 Flash presents a distinctive combination of high consistency and strong structural preferences. Its G cluster effects are the lowest among tested models, meaning recommendations are stable across question framings and presentation orders. However, its E cluster effects are among the highest — particularly E2 (brand recognition) where it shows the study's largest model-specific bias. This suggests Gemini's consistency extends to its biases: it reliably exhibits the same preferences.

Strongest Biases(hover for details)

Cluster F

F2—Time Horizon

+3.14

Cluster E

E2—Brand Preference

+2.17

Cluster E

E4—Product Type

+0.94

Cluster D

D5—Jurisdiction

+0.92

Cluster E

E1—Tech Preference

+0.78

Cluster Overview

Cluster A

-0.08

Cluster C

+0.21

Cluster D

+0.21

Cluster E

+0.94

Cluster F

+3.14

Cluster G

+0.06

Anthropic

Claude Sonnet 4.6

The Cautious Contrarian

High compliance with distinctive biases

Full Profile →

Overview

Claude Sonnet 4.6 is Anthropic's Constitutional AI model, which emerged in the VERRIX study as "The Cautious Contrarian." It shows the highest regulatory compliance orientation, lowest structural biases, but also the highest refusal rate and most distinctive bias profile that often diverges from other models.

Best For

✓Regulated financial services applications
✓Scenarios requiring maximum compliance orientation
✓Use cases where AI disclosure is important
✓Clients with low risk tolerance

Watch For

!Higher refusal rate may frustrate users seeking direct advice
!May be overly cautious for some use cases
!Lower consistency than Gemini across equivalent scenarios
!Availability bias: Strong recency effects in some contexts

Behavioral Profile

Claude Sonnet 4.6 shows the Constitutional AI influence clearly in its advice genome. The D cluster (regulatory compliance) effects are uniformly strong and positive. E cluster (structural preferences) effects are the lowest among all models, suggesting training interventions have successfully reduced brand and sector biases. However, the model shows stronger B5 (recency) effects than expected, and its G3 (context sensitivity) is higher than other models.

Strongest Biases(hover for details)

Cluster F

F2—Time Horizon

+3.14

Cluster E

E4—Product Type

+2.07

Cluster B

B5—Recency

-1.59

Cluster E

E2—Brand Preference

+1.54

Cluster G

G3—Context Effect

-1.33

Cluster Overview

Cluster A

+0.23

Cluster B

-0.20

Cluster C

-0.32

Cluster D

-0.06

Cluster E

+1.35

Cluster F

+3.14

Cluster G

-0.68

OpenAI

GPT-5.2

The Steady Traditionalist

Resistant to hype with strong status quo preferences

Full Profile →

Overview

GPT-5.2 is an earlier OpenAI model tested in the VERRIX Extension study. It emerged as "The Steady Traditionalist" — characterized by strong resistance to recency/availability bias, but also strong status quo and geographic home preferences. It represents an interesting contrast to its successors.

Best For

✓Long-term investment guidance where trend-chasing is a concern
✓Scenarios where stability is valued over responsiveness
✓Users who prefer familiar, US-focused recommendations
✓Situations where media hype should be filtered out

Watch For

!May resist appropriate rebalancing due to status quo bias
!Strong US market preference limits international diversification
!Narrative fallacy may affect qualitative assessments
!Negative AI disclosure score suggests less transparency

Behavioral Profile

GPT-5.2 presents a distinctive generational signature within the OpenAI family. Its B5 (availability/recency) effect is the most resistant in the study at h=-1.61, suggesting training that deprioritizes recent information. However, this is paired with the highest A6 (status quo) bias at h=1.69. The E3 (geographic) effect at h=1.29 indicates strong home market preference. Notably, E2 (brand preference) is lower than successor models, suggesting brand bias may have increased in later training iterations.

Strongest Biases(hover for details)

Cluster F

F2—Time Horizon

+3.14

Cluster A

A6—Status Quo

+1.69

Cluster B

B5—Recency

-1.61

Cluster E

E3—Geography

+1.29

Cluster B

B6—Narrative

+1.05

Cluster Overview

Cluster A

+0.55

Cluster B

-0.21

Cluster C

-0.06

Cluster D

-0.13

Cluster E

+0.90

Cluster F

+3.14

Cluster G

-0.09

Quick Comparison

Characteristic	GPT-5.3 Instant	GPT-5.4 Thinking	2.0 Flash	Sonnet 4.6
Directness	●●●●●	●●●○○	●●●●○	●●○○○
Consistency	●●●○○	●●●●○	●●●●●	●●●○○
Compliance Focus	●●●○○	●●●●○	●●●○○	●●●●●
Sector Neutrality	●●○○○	●●●○○	●○○○○	●●●●○
Reasoning Depth	●●○○○	●●●●●	●●●○○	●●●●○

● = Strength rating based on VERRIX advice genome analysis

Compare these fingerprints side by side

See how the models diverge dimension by dimension in the Advice Genome Explorer.

Open the Genome Explorer →