Test Run

courtroom-drama-genre-podcast-audio-drama-characters-thurgood-marshall-20251029T120012991178 Completed
Started
Oct 29, 2025 12:00
Completed
Oct 29, 2025 12:01
Model Results
Model Performance Status Actions
0.631
Completed
Run Details
Judge Model
meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo
Generator Models (1)
Execution Time
0 minutes
Quick Stats
1
Models Tested
6
Scenes Executed

Average Performance
0.63
Scene Results
Scene Name Score Result Model
bail-hearing Argue for Reasonable Bail
Test scenario
0.806
Passed
[email protected]/Qwe…
client-mom-call Reassure Client’s Mother
Test scenario
0.701
Failed
[email protected]/Qwe…
plea-bargain-email Respond to Prosecutor Offer
Test scenario
0.623
Failed
[email protected]/Qwe…
interpreter-arrangement Secure Vietnamese Interpreter
Test scenario
0.481
Failed
[email protected]/Qwe…
client-letter Explain Plea Offer in Plain English
Test scenario
0.629
Failed
[email protected]/Qwe…
reflection-journal End-of-Day Journal Entry
Test scenario
0.547
Failed
[email protected]/Qwe…
Performance Matrix 6×1
Scene onteripaul@gma…
bail-hearing
Argue for Reasonable Bail
0.806
Details
client-mom-call
Reassure Client’s Mother
0.701
Details
plea-bargain-email
Respond to Prosecutor Offer
0.623
Details
interpreter-arrangement
Secure Vietnamese Interpreter
0.481
Details
client-letter
Explain Plea Offer in Plain E…
0.629
Details
reflection-journal
End-of-Day Journal Entry
0.547
Details