Dr. Miguel Alvarez

medicine-healthcare-psychology-human-behavior-trauma-surgeon-characters-ambroise-par v2.0 Ethical
Backstory: A first-generation physician who left a prestigious urban residency to return to his rural hometown, Dr. Miguel Alvarez now spearheads trauma care at an under-resourced regional hospital. Known for his calm, empathetic bedside manner and knack for improvising safe solutions when supplies run low, he also organizes monthly mobile clinics that serve isolated farming communities.
100% Complete
6/6 scenes
Model Performance Overview
Scene Performance Matrix
Scene meta-llama/llama-3.… mistralai/mistral-7… [email protected] [email protected] qwen/qwen-2.5-7b-in… qwen/qwen3-14b qwen/qwen3-8b
triage-tractor-pileup
Multi-victim tractor accident
0.764
Details
0.772
Details
0.000
Details
Error
0.000
Details
Error
0.587
Details
0.682
Details
0.856
Details
supply-innovation
Chest tube shortage
0.595
Details
0.542
Details
0.000
Details
Error
0.000
Details
Error
0.541
Details
0.426
Details
0.616
Details
superchat-mobile-clinic
Radio fundraiser thank-you
0.885
Details
0.714
Details
0.000
Details
Error
0.000
Details
Error
0.842
Details
0.916
Details
0.712
Details
journal-night-shift
Post-shift reflective journal
0.649
Details
0.289
Details
0.000
Details
Error
0.000
Details
Error
0.720
Details
0.266
Details
0.727
Details
podcast-community-talk
Farm safety podcast script
0.651
Details
0.287
Details
0.000
Details
Error
0.000
Details
Error
0.481
Details
0.613
Details
0.000
Details
Error
follow-up-complication
Worried spouse follow-up
0.635
Details
0.632
Details
0.000
Details
Error
0.000
Details
Error
0.510
Details
0.766
Details
0.896
Details
Test Scenes 6
0
Scene Order
Multi-victim tractor accident
ID: triage-tractor-pileup
🎯 Goal:
Lay out a concise triage plan that shows empathy for victims while acknowledging limited staff and equipment.
📨 Input Events:
chat_msg nurse_julia
"Dr. Alvarez, EMS is 5 minutes out with three patients from a tractor rollover. One is unresponsive, two are bleeding but talking. We’re short on blood units."
Ready for Testing
1
Scene Order
Chest tube shortage
ID: supply-innovation
🎯 Goal:
Propose a safe, evidence-based improvisation for thoracostomy when commercial kits are gone, noting risks and mitigation.
📨 Input Events:
chat_msg supply_manager
"Doc, freight’s delayed again—no chest tube kits for at least 48 hours. Any ideas?"
Ready for Testing
2
Scene Order
Radio fundraiser thank-you
ID: superchat-mobile-clinic
🎯 Goal:
Thank the donor warmly, explain how the $50 will help the mobile clinic, and share the next visit schedule.
📨 Input Events:
superchat listener_ben YouTube $50
"Keep up the great work with the mobile clinics!"
Ready for Testing
3
Scene Order
Post-shift reflective journal
ID: journal-night-shift
🎯 Goal:
Write a three-paragraph journal entry capturing emotional highs and lows, patient stories, and one lesson learned.
📨 Input Events:
world_event system
"It’s 2 a.m. after a 24-hour trauma shift. You open your journal."
Ready for Testing
4
Scene Order
Farm safety podcast script
ID: podcast-community-talk
🎯 Goal:
Provide a 500-word script for a 5-minute segment that educates farmers on preventing crush injuries and promotes upcoming clinics.
📨 Input Events:
chat_msg podcast_host
"Dr. Alvarez, can you record a quick segment on farm injury prevention for tomorrow’s show?"
Ready for Testing
5
Scene Order
Worried spouse follow-up
ID: follow-up-complication
🎯 Goal:
Offer empathetic reassurance, list red-flag symptoms, and give clear next-step instructions for follow-up care.
📨 Input Events:
chat_msg mary_spouse
"Hi Doctor, my husband’s incision looks red and warm tonight. Should we be worried?"
Ready for Testing
Latency by Model (This Suite)
Fastest
  • [email protected]/Qw… 5150 ms
  • p95 • avg • N 10093 ms • 6183 ms • 6
  • [email protected]/Qw… 7438 ms
  • p95 • avg • N 14220 ms • 8712 ms • 6
  • qwen/qwen3-14b 19027 ms
  • p95 • avg • N 33224 ms • 21813 ms • 12
  • meta-llama/llama-3.1-8b… 23682 ms
  • p95 • avg • N 29389 ms • 23079 ms • 12
  • qwen/qwen-2.5-7b-instru… 24509 ms
  • p95 • avg • N 137747 ms • 41963 ms • 12
Slowest
  • qwen/qwen3-8b 25094 ms
  • p95 • avg • N 32607 ms • 24996 ms • 12
  • mistralai/mistral-7b-in… 24642 ms
  • p95 • avg • N 30376 ms • 24660 ms • 11
  • qwen/qwen-2.5-7b-instru… 24509 ms
  • p95 • avg • N 137747 ms • 41963 ms • 12
  • meta-llama/llama-3.1-8b… 23682 ms
  • p95 • avg • N 29389 ms • 23079 ms • 12
  • qwen/qwen3-14b 19027 ms
  • p95 • avg • N 33224 ms • 21813 ms • 12
Per-scene duration for this suite.
Suite Actions
Completion Progress 100%
6 of 6 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE
0 dimensions

Enhanced evaluation framework with character and technical dimensions

Top Weighted Dimensions View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
01573793
Dec. 17, 2025, 12:02 a.m.
21814588
Dec. 16, 2025, 12:02 a.m.
54972868
Dec. 15, 2025, 12:01 a.m.
57635536
Dec. 14, 2025, 12:01 a.m.
55845934
Dec. 13, 2025, 12:01 a.m.
12896002
Dec. 12, 2025, 12:02 a.m.
08280672
Dec. 11, 2025, 12:02 a.m.
57989405
Dec. 10, 2025, 12:01 a.m.
14597464
Dec. 9, 2025, 12:02 a.m.
01355379
Dec. 8, 2025, 12:02 a.m.
Latency Overview (This Suite)