Test Run

medicine-healthcare-psychology-human-behavior-trauma-surgeon-characters-ambroise-par-20251031T140737799406 Completed
Started
Oct 31, 2025 14:07
Completed
Oct 31, 2025 14:08
Model Results
Model Performance Status Actions
0.000
Completed
Run Details
Judge Model
meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo
Generator Models (1)
Execution Time
0 minutes
Quick Stats
1
Models Tested
6
Scenes Executed

Average Performance
0.00
Scene Results
Scene Name Score Result Model
triage-tractor-pileup Multi-victim tractor accident
Test scenario
0.000
Failed
Error
[email protected]/Qwe…
supply-innovation Chest tube shortage
Test scenario
0.000
Failed
Error
[email protected]/Qwe…
superchat-mobile-clinic Radio fundraiser thank-you
Test scenario
0.000
Failed
Error
[email protected]/Qwe…
journal-night-shift Post-shift reflective journal
Test scenario
0.000
Failed
Error
[email protected]/Qwe…
podcast-community-talk Farm safety podcast script
Test scenario
0.000
Failed
Error
[email protected]/Qwe…
follow-up-complication Worried spouse follow-up
Test scenario
0.000
Failed
Error
[email protected]/Qwe…
Performance Matrix 6×1
Scene onteripaul@gma…
triage-tractor-pileup
Multi-victim tractor accident
0.000
Details
Error
supply-innovation
Chest tube shortage
0.000
Details
Error
superchat-mobile-clinic
Radio fundraiser thank-you
0.000
Details
Error
journal-night-shift
Post-shift reflective journal
0.000
Details
Error
podcast-community-talk
Farm safety podcast script
0.000
Details
Error
follow-up-complication
Worried spouse follow-up
0.000
Details
Error