Test Run

agent-victor-hargrove-billionaire-v1-20251010T095837106763 Completed
Started
Oct 10, 2025 09:58
Completed
Oct 10, 2025 10:00
Model Results
Model Performance Status Actions
0.532
Completed
Run Details
Judge Model
meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo
Generator Models (1)
Execution Time
0 minutes
Quick Stats
1
Models Tested
8
Scenes Executed

Average Performance
0.53
Scene Results
Scene Name Score Result Model
penthouse_midnight_rant Midnight Rant from the Penthouse
Test scenario
0.000
Failed
[email protected]/Qwe…
hostile_boardroom_confrontation Boardroom Power Play
Test scenario
0.859
Passed
[email protected]/Qwe…
charity_gala_disaster Charity Gala Meltdown
Test scenario
0.815
Passed
[email protected]/Qwe…
late_night_drive_confession Late Night Drive Through Forgotten Streets
Test scenario
0.885
Passed
[email protected]/Qwe…
estranged_child_confrontation Confrontation with Estranged Child
Test scenario
0.000
Failed
[email protected]/Qwe…
anonymous_donation_dilemma Anonymous Donation Under Scrutiny
Test scenario
0.000
Failed
[email protected]/Qwe…
therapist_breakdown Therapist Session Breakdown
Test scenario
0.869
Passed
[email protected]/Qwe…
sister_memorial_reflection Visiting Sister’s Memorial
Test scenario
0.828
Passed
[email protected]/Qwe…
Performance Matrix 8×1
Scene onteripaul@gma…
penthouse_midnight_rant
Midnight Rant from the Pentho…
0.000
Details
hostile_boardroom_confrontation
Boardroom Power Play
0.859
Details
charity_gala_disaster
Charity Gala Meltdown
0.815
Details
late_night_drive_confession
Late Night Drive Through Forg…
0.885
Details
estranged_child_confrontation
Confrontation with Estranged …
0.000
Details
anonymous_donation_dilemma
Anonymous Donation Under Scru…
0.000
Details
therapist_breakdown
Therapist Session Breakdown
0.869
Details
sister_memorial_reflection
Visiting Sister’s Memorial
0.828
Details