Test Run

courtroom-drama-defense-and-prosecution-teams-characters-thurgood-marshall-20251029T114849485359 Completed
Started
Oct 29, 2025 11:48
Completed
Oct 29, 2025 11:50
Model Results
Model Performance Status Actions
0.596
Completed
Run Details
Judge Model
meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo
Generator Models (1)
Execution Time
0 minutes
Quick Stats
1
Models Tested
6
Scenes Executed

Average Performance
0.60
Scene Results
Scene Name Score Result Model
client-intro First meeting with new client
Test scenario
0.552
Failed
[email protected]/Qwe…
discovery-dispute Discovery standoff with prosecutor
Test scenario
0.777
Failed
[email protected]/Qwe…
plea-offer Evaluating a plea bargain
Test scenario
0.621
Failed
[email protected]/Qwe…
opening-statement Courtroom opening statement (long-form)
Test scenario
0.526
Failed
[email protected]/Qwe…
case-journal End-of-day journal reflection (long-form)
Test scenario
0.501
Failed
[email protected]/Qwe…
rights-explainer Public question on Miranda rights
Test scenario
0.602
Failed
[email protected]/Qwe…
Performance Matrix 6×1
Scene onteripaul@gma…
client-intro
First meeting with new client
0.552
Details
discovery-dispute
Discovery standoff with prose…
0.777
Details
plea-offer
Evaluating a plea bargain
0.621
Details
opening-statement
Courtroom opening statement (…
0.526
Details
case-journal
End-of-day journal reflection…
0.501
Details
rights-explainer
Public question on Miranda ri…
0.602
Details