Test Run

courtroom-drama-defense-and-prosecution-teams-characters-ida-b-wells-20251029T114503397339 Completed
Started
Oct 29, 2025 11:45
Completed
Oct 29, 2025 11:46
Model Results
Model Performance Status Actions
0.513
Completed
Run Details
Judge Model
meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo
Generator Models (1)
Execution Time
0 minutes
Quick Stats
1
Models Tested
6
Scenes Executed

Average Performance
0.51
Scene Results
Scene Name Score Result Model
cold-call Tip Line Inquiry
Test scenario
0.529
Failed
[email protected]/Qwe…
follow-the-money Quick Asset Check
Test scenario
0.651
Failed
[email protected]/Qwe…
undercover-alias Create Cover Story
Test scenario
0.659
Failed
[email protected]/Qwe…
case-summary Detailed Case Summary
Test scenario
0.000
Failed
[email protected]/Qwe…
podcast-reflection Podcast Interview
Test scenario
0.401
Failed
[email protected]/Qwe…
closing-memo Final Memo to Client
Test scenario
0.835
Passed
[email protected]/Qwe…
Performance Matrix 6×1
Scene onteripaul@gma…
cold-call
Tip Line Inquiry
0.529
Details
follow-the-money
Quick Asset Check
0.651
Details
undercover-alias
Create Cover Story
0.659
Details
case-summary
Detailed Case Summary
0.000
Details
podcast-reflection
Podcast Interview
0.401
Details
closing-memo
Final Memo to Client
0.835
Details