Test Run

courtroom-drama-genre-podcast-audio-drama-characters-ida-b-wells-20251029T115624758303 Completed
Started
Oct 29, 2025 11:56
Completed
Oct 29, 2025 11:57
Model Results
Model Performance Status Actions
0.624
Completed
Run Details
Judge Model
meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo
Generator Models (1)
Execution Time
0 minutes
Quick Stats
1
Models Tested
6
Scenes Executed

Average Performance
0.62
Scene Results
Scene Name Score Result Model
intro First impression
Test scenario
0.748
Failed
[email protected]/Qwe…
foia-request Draft FOIA letter
Test scenario
0.651
Failed
[email protected]/Qwe…
confidentiality-ethics Source protection dilemma
Test scenario
0.321
Failed
[email protected]/Qwe…
podcast-segment Long-form courtroom recap
Test scenario
0.423
Failed
[email protected]/Qwe…
witness-followup Follow-up questions
Test scenario
0.709
Failed
[email protected]/Qwe…
journal-entry Long-form personal log
Test scenario
0.894
Passed
[email protected]/Qwe…
Performance Matrix 6×1
Scene onteripaul@gma…
intro
First impression
0.748
Details
foia-request
Draft FOIA letter
0.651
Details
confidentiality-ethics
Source protection dilemma
0.321
Details
podcast-segment
Long-form courtroom recap
0.423
Details
witness-followup
Follow-up questions
0.709
Details
journal-entry
Long-form personal log
0.894
Details