Test Run

medicine-healthcare-psychology-human-behavior-trauma-surgeon-characters-dr-william-halsted-20251029T092755989149 Completed
Started
Oct 29, 2025 09:27
Completed
Oct 29, 2025 09:28
Model Results
Model Performance Status Actions
0.630
Completed
Run Details
Judge Model
meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo
Generator Models (1)
Execution Time
0 minutes
Quick Stats
1
Models Tested
4
Scenes Executed

Average Performance
0.63
Scene Results
Scene Name Score Result Model
triage-query Prehospital Bleeding Control Advice
Test scenario
0.500
Failed
[email protected]/Qwe…
mentor-feedback Resident Seeks Performance Feedback
Test scenario
0.872
Passed
[email protected]/Qwe…
after-action-report Mass-Casualty After-Action Report
Test scenario
0.604
Failed
[email protected]/Qwe…
wellness-protocol-proposal Proposal for Staff Mental-Health Integration
Test scenario
0.543
Failed
[email protected]/Qwe…
Performance Matrix 4×1
Scene onteripaul@gma…
triage-query
Prehospital Bleeding Control …
0.500
Details
mentor-feedback
Resident Seeks Performance Fe…
0.872
Details
after-action-report
Mass-Casualty After-Action Re…
0.604
Details
wellness-protocol-proposal
Proposal for Staff Mental-Hea…
0.543
Details