Dr. Sofia Calderón
medicine-healthcare-psychology-human-behavior-trauma-surgeon-characters-dr-william-halsted
v2.0
Ethical
Backstory: Raised in a multilingual household in Toronto, Sofia earned her MD at McGill and completed trauma-surgery training in Boston. She served on multiple field missions with an international medical NGO before becoming director of a Level I trauma center in a major U.S. city. Known for methodical decision-making and deep empathy, she mentors residents and pioneers protocols that embed mental-health support for frontline staff.
100% Complete
4/4 scenes
Model Performance Overview
Scene Performance Matrix
| Scene | deepseek/deepseek-r… | google/gemini-2.5-f… | google/gemma-3-12b-… | meta-llama/llama-3.… | microsoft/phi-3-med… | microsoft/phi-3.5-m… | mistralai/mistral-7… | neversleep/noromaid… | [email protected]… | [email protected]… | qwen/qwen-2.5-7b-in… | qwen/qwen3-14b | qwen/qwen3-8b |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
triage-query
Prehospital Bleeding Control Advice
|
0.472
Details |
0.558
Details |
0.775
Details |
0.510
Details |
0.000
Details |
0.000
Details
Error
|
0.660
Details |
0.185
Details |
0.000
Details
Error
|
0.500
Details |
0.446
Details |
0.486
Details |
0.650
Details |
mentor-feedback
Resident Seeks Performance Feedback
|
0.866
Details |
0.867
Details |
0.819
Details |
0.745
Details |
0.000
Details |
0.804
Details |
0.804
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.872
Details |
0.643
Details |
0.844
Details |
0.842
Details |
after-action-report
Mass-Casualty After-Action Report
|
0.385
Details |
0.621
Details |
0.447
Details |
0.000
Details |
0.000
Details |
0.000
Details
Error
|
0.646
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.604
Details |
0.010
Details |
0.290
Details |
0.545
Details |
wellness-protocol-proposal
Proposal for Staff Mental-Health Integration
|
0.235
Details |
0.800
Details |
0.449
Details |
0.436
Details |
0.023
Details |
0.741
Details |
0.745
Details |
0.000
Details |
0.000
Details
Error
|
0.543
Details |
0.259
Details |
0.472
Details |
0.615
Details |
Test Scenes 4
0
Scene Order
Prehospital Bleeding Control Advice
ID:
triage-query
🎯 Goal:
Provide concise, step-by-step instructions for a bystander to control severe limb bleeding until EMS arrives.
📨 Input Events:
chat_msg
viewer:caller_911
"There's a cyclist who crashed and blood is spurting from his thigh! What should I do right now?"
Ready for Testing
1
Scene Order
Resident Seeks Performance Feedback
ID:
mentor-feedback
🎯 Goal:
Deliver specific, constructive feedback highlighting both strengths and areas to improve, in a supportive tone.
📨 Input Events:
chat_msg
viewer:resident_pg2
"Dr. Calderón, could you give me honest feedback on my performance during yesterday's spleen lac case?"
Ready for Testing
2
Scene Order
Mass-Casualty After-Action Report
ID:
after-action-report
🎯 Goal:
Compose a structured after-action report of at least 300 words summarizing clinical outcomes, logistical challenges, and recommended protocol updates.
📨 Input Events:
chat_msg
viewer:chief_of_staff
"Please draft the after-action report for last night's multi-vehicle pile-up response."
Ready for Testing
3
Scene Order
Proposal for Staff Mental-Health Integration
ID:
wellness-protocol-proposal
🎯 Goal:
Write a clear, persuasive proposal (~400 words) outlining a new mental-health support protocol for trauma staff, including rationale, steps, and metrics.
📨 Input Events:
chat_msg
viewer:hospital_board
"Submit your proposal for integrating mental-health support into daily trauma unit operations."
Ready for Testing
Latency by Model (This Suite)
Fastest
- [email protected]/Qw… 12547 ms
- p95 • avg • N 14455 ms • 12617 ms • 4
- google/gemini-2.5-flash 17194 ms
- p95 • avg • N 20505 ms • 17460 ms • 8
- qwen/qwen-2.5-7b-instru… 20856 ms
- p95 • avg • N 135364 ms • 56821 ms • 7
- qwen/qwen3-14b 23611 ms
- p95 • avg • N 29885 ms • 23141 ms • 8
- meta-llama/llama-3.1-8b… 24680 ms
- p95 • avg • N 48493 ms • 29001 ms • 8
Slowest
- microsoft/phi-3-medium-… 211038 ms
- p95 • avg • N 225169 ms • 192056 ms • 8
- [email protected]/Qw… 145850 ms
- p95 • avg • N 246107 ms • 145477 ms • 4
- microsoft/phi-3.5-mini-… 55288 ms
- p95 • avg • N 75163 ms • 51754 ms • 8
- deepseek/deepseek-r1-di… 34525 ms
- p95 • avg • N 40388 ms • 34858 ms • 8
- mistralai/mistral-7b-in… 27713 ms
- p95 • avg • N 32898 ms • 27371 ms • 8
Per-scene duration for this suite.
Suite Actions
Completion Progress
100%
4 of 4 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE0 dimensions
Enhanced evaluation framework with character and technical dimensions
Top Weighted Dimensions
View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
35569259
Dec. 17, 2025, midnight
41151394
Dec. 16, 2025, midnight
33230169
Dec. 15, 2025, midnight
36271748
Dec. 14, 2025, midnight
33224236
Dec. 13, 2025, midnight
40093020
Dec. 12, 2025, midnight
34640705
Dec. 11, 2025, midnight
34198971
Dec. 10, 2025, midnight
38537444
Dec. 9, 2025, midnight
34241211
Dec. 8, 2025, midnight