Test Run
courtroom-drama-defense-and-prosecution-teams-characters-sandra-day-o-connor-20251029T114730292630
Completed
Started
Oct 29, 2025 11:47
Completed
Oct 29, 2025 11:48
Model Results
| Model | Performance | Status | Actions |
|---|---|---|---|
|
[email protected]/Qwen3-8B-b0d7af1f
AI Language Model
|
0.658
|
Completed |
Run Details
Judge Model
meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo
Generator Models (1)
Execution Time
0 minutes
Quick Stats
1
Models Tested
6
Scenes Executed
Average Performance
0.66
Scene Results
| Scene | Name | Score | Result | Model |
|---|---|---|---|---|
new-case-intake
|
Detective briefing on fresh homicide
Test scenario
|
0.734
|
Failed
|
[email protected]/Qwe… |
plea-evaluation
|
Defense offers plea deal
Test scenario
|
0.562
|
Failed
|
[email protected]/Qwe… |
witness-prep-guide
|
Long-form witness preparation memo
Test scenario
|
0.476
|
Failed
|
[email protected]/Qwe… |
closing-argument-draft
|
Long-form closing argument
Test scenario
|
0.613
|
Failed
|
[email protected]/Qwe… |
custody-review
|
Chain-of-custody concern
Test scenario
|
0.691
|
Failed
|
[email protected]/Qwe… |
media-statement
|
Press inquiry after arraignment
Test scenario
|
0.871
|
Passed
|
[email protected]/Qwe… |
Performance Matrix 6×1
| Scene | onteripaul@gma… |
|---|---|
new-case-intake
Detective briefing on fresh h…
|
0.734
Details |
plea-evaluation
Defense offers plea deal
|
0.562
Details |
witness-prep-guide
Long-form witness preparation…
|
0.476
Details |
closing-argument-draft
Long-form closing argument
|
0.613
Details |
custody-review
Chain-of-custody concern
|
0.691
Details |
media-statement
Press inquiry after arraignme…
|
0.871
Details |