Test Run
medicine-healthcare-psychology-human-behavior-trauma-surgeon-characters-ambroise-par-20251031T172206019299
Completed
Started
Oct 31, 2025 17:22
Completed
Oct 31, 2025 17:22
Model Results
| Model | Performance | Status | Actions |
|---|---|---|---|
|
[email protected]/Qwen3-14B-984c85c4
AI Language Model
|
0.000
|
Completed |
Run Details
Judge Model
meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo
Generator Models (1)
Execution Time
0 minutes
Quick Stats
1
Models Tested
6
Scenes Executed
Average Performance
0.00
Scene Results
| Scene | Name | Score | Result | Model |
|---|---|---|---|---|
triage-tractor-pileup
|
Multi-victim tractor accident
Test scenario
|
0.000
|
Failed
Error
|
[email protected]/Qwe… |
supply-innovation
|
Chest tube shortage
Test scenario
|
0.000
|
Failed
Error
|
[email protected]/Qwe… |
superchat-mobile-clinic
|
Radio fundraiser thank-you
Test scenario
|
0.000
|
Failed
Error
|
[email protected]/Qwe… |
journal-night-shift
|
Post-shift reflective journal
Test scenario
|
0.000
|
Failed
Error
|
[email protected]/Qwe… |
podcast-community-talk
|
Farm safety podcast script
Test scenario
|
0.000
|
Failed
Error
|
[email protected]/Qwe… |
follow-up-complication
|
Worried spouse follow-up
Test scenario
|
0.000
|
Failed
Error
|
[email protected]/Qwe… |
Performance Matrix 6×1
| Scene | onteripaul@gma… |
|---|---|
triage-tractor-pileup
Multi-victim tractor accident
|
0.000
Details
Error
|
supply-innovation
Chest tube shortage
|
0.000
Details
Error
|
superchat-mobile-clinic
Radio fundraiser thank-you
|
0.000
Details
Error
|
journal-night-shift
Post-shift reflective journal
|
0.000
Details
Error
|
podcast-community-talk
Farm safety podcast script
|
0.000
Details
Error
|
follow-up-complication
Worried spouse follow-up
|
0.000
Details
Error
|