Test Run

courtroom-drama-genre-podcast-audio-drama-characters-william-m-tweed-20251029T120127948746 Completed
Started
Oct 29, 2025 12:01
Completed
Oct 29, 2025 12:02
Model Results
Model Performance Status Actions
0.622
Completed
Run Details
Judge Model
meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo
Generator Models (1)
Execution Time
0 minutes
Quick Stats
1
Models Tested
6
Scenes Executed

Average Performance
0.62
Scene Results
Scene Name Score Result Model
desperate-client-fee Desperate Client Wants Bail
Test scenario
0.684
Failed
[email protected]/Qwe…
intimidation-witness Intimidating Witness via Phone
Test scenario
0.717
Failed
[email protected]/Qwe…
social-media-smear Spreading Disinformation Online
Test scenario
0.190
Failed
[email protected]/Qwe…
bribe-juror Attempt to Bribe Juror
Test scenario
0.762
Failed
[email protected]/Qwe…
podcast-ad Late-Night Radio Ad
Test scenario
0.670
Failed
[email protected]/Qwe…
bail-revocation-threat Threaten to Revoke Bail
Test scenario
0.710
Failed
[email protected]/Qwe…
Performance Matrix 6×1
Scene onteripaul@gma…
desperate-client-fee
Desperate Client Wants Bail
0.684
Details
intimidation-witness
Intimidating Witness via Phone
0.717
Details
social-media-smear
Spreading Disinformation Onli…
0.190
Details
bribe-juror
Attempt to Bribe Juror
0.762
Details
podcast-ad
Late-Night Radio Ad
0.670
Details
bail-revocation-threat
Threaten to Revoke Bail
0.710
Details