Test Run
courtroom-drama-genre-podcast-audio-drama-characters-thurgood-marshall-20251029T120012991178
Completed
Started
Oct 29, 2025 12:00
Completed
Oct 29, 2025 12:01
Model Results
| Model | Performance | Status | Actions |
|---|---|---|---|
|
[email protected]/Qwen3-8B-b0d7af1f
AI Language Model
|
0.631
|
Completed |
Run Details
Judge Model
meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo
Generator Models (1)
Execution Time
0 minutes
Quick Stats
1
Models Tested
6
Scenes Executed
Average Performance
0.63
Scene Results
| Scene | Name | Score | Result | Model |
|---|---|---|---|---|
bail-hearing
|
Argue for Reasonable Bail
Test scenario
|
0.806
|
Passed
|
[email protected]/Qwe… |
client-mom-call
|
Reassure Client’s Mother
Test scenario
|
0.701
|
Failed
|
[email protected]/Qwe… |
plea-bargain-email
|
Respond to Prosecutor Offer
Test scenario
|
0.623
|
Failed
|
[email protected]/Qwe… |
interpreter-arrangement
|
Secure Vietnamese Interpreter
Test scenario
|
0.481
|
Failed
|
[email protected]/Qwe… |
client-letter
|
Explain Plea Offer in Plain English
Test scenario
|
0.629
|
Failed
|
[email protected]/Qwe… |
reflection-journal
|
End-of-Day Journal Entry
Test scenario
|
0.547
|
Failed
|
[email protected]/Qwe… |
Performance Matrix 6×1
| Scene | onteripaul@gma… |
|---|---|
bail-hearing
Argue for Reasonable Bail
|
0.806
Details |
client-mom-call
Reassure Client’s Mother
|
0.701
Details |
plea-bargain-email
Respond to Prosecutor Offer
|
0.623
Details |
interpreter-arrangement
Secure Vietnamese Interpreter
|
0.481
Details |
client-letter
Explain Plea Offer in Plain E…
|
0.629
Details |
reflection-journal
End-of-Day Journal Entry
|
0.547
Details |