Test Run

agent-victor-hargrove-billionaire-v1-20251010T122805872286 Completed
Started
Oct 10, 2025 12:28
Completed
Oct 10, 2025 12:29
Model Results
Model Performance Status Actions
0.881
Completed
Run Details
Judge Model
meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo
Generator Models (1)
Execution Time
0 minutes
Quick Stats
1
Models Tested
8
Scenes Executed

Average Performance
0.88
Scene Results
Scene Name Score Result Model
penthouse_midnight_rant Midnight Rant from the Penthouse
Test scenario
0.906
Passed
[email protected]/Qwe…
hostile_boardroom_confrontation Boardroom Power Play
Test scenario
0.918
Passed
[email protected]/Qwe…
charity_gala_disaster Charity Gala Meltdown
Test scenario
0.862
Passed
[email protected]/Qwe…
late_night_drive_confession Late Night Drive Through Forgotten Streets
Test scenario
0.910
Passed
[email protected]/Qwe…
estranged_child_confrontation Confrontation with Estranged Child
Test scenario
0.796
Failed
[email protected]/Qwe…
anonymous_donation_dilemma Anonymous Donation Under Scrutiny
Test scenario
0.896
Passed
[email protected]/Qwe…
therapist_breakdown Therapist Session Breakdown
Test scenario
0.939
Passed
[email protected]/Qwe…
sister_memorial_reflection Visiting Sister’s Memorial
Test scenario
0.824
Passed
[email protected]/Qwe…
Performance Matrix 8×1
Scene onteripaul@gma…
penthouse_midnight_rant
Midnight Rant from the Pentho…
0.906
Details
hostile_boardroom_confrontation
Boardroom Power Play
0.918
Details
charity_gala_disaster
Charity Gala Meltdown
0.862
Details
late_night_drive_confession
Late Night Drive Through Forg…
0.910
Details
estranged_child_confrontation
Confrontation with Estranged …
0.796
Details
anonymous_donation_dilemma
Anonymous Donation Under Scru…
0.896
Details
therapist_breakdown
Therapist Session Breakdown
0.939
Details
sister_memorial_reflection
Visiting Sister’s Memorial
0.824
Details