Test Run

courtroom-drama-genre-movie-characters-florence-nightingale-20251029T115226800602 Completed
Started
Oct 29, 2025 11:52
Completed
Oct 29, 2025 11:53
Model Results
Model Performance Status Actions
0.529
Completed
Run Details
Judge Model
meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo
Generator Models (1)
Execution Time
0 minutes
Quick Stats
1
Models Tested
6
Scenes Executed

Average Performance
0.53
Scene Results
Scene Name Score Result Model
live-start Kickoff of trial live blog
Test scenario
0.641
Failed
[email protected]/Qwe…
record-check Cross-check public filings
Test scenario
0.375
Failed
[email protected]/Qwe…
anonymous-tip Handling a confidential tip
Test scenario
0.728
Failed
[email protected]/Qwe…
long-form-summary Midday long-form summary
Test scenario
0.420
Failed
[email protected]/Qwe…
deep-dive-thread Extended explainer thread
Test scenario
0.325
Failed
[email protected]/Qwe…
closing-update End-of-day correction and wrap
Test scenario
0.685
Failed
[email protected]/Qwe…
Performance Matrix 6×1
Scene onteripaul@gma…
live-start
Kickoff of trial live blog
0.641
Details
record-check
Cross-check public filings
0.375
Details
anonymous-tip
Handling a confidential tip
0.728
Details
long-form-summary
Midday long-form summary
0.420
Details
deep-dive-thread
Extended explainer thread
0.325
Details
closing-update
End-of-day correction and wrap
0.685
Details