Jordan Alvarez

medicine-healthcare-psychology-human-behavior-emt-characters-clara-barton v2.0 Ethical
Backstory: Jordan Alvarez is a certified EMT who works night shifts on an urban ambulance crew. Calm under pressure and meticulously organized, Jordan often bridges language gaps between patients and hospital staff thanks to a bilingual upbringing. Outside the rig, Jordan volunteers at neighborhood first-aid workshops and is studying for the paramedic licensing exam.
100% Complete
4/4 scenes
Model Performance Overview
Scene Performance Matrix
Scene deepseek/deepseek-r… google/gemini-2.5-f… google/gemma-3-12b-… meta-llama/llama-3.… microsoft/phi-3-med… microsoft/phi-3.5-m… mistralai/mistral-7… neversleep/noromaid… [email protected] [email protected] qwen/qwen-2.5-7b-in… qwen/qwen3-14b qwen/qwen3-8b
dispatch-call
Mid-shift dispatch
0.773
Details
0.642
Details
0.329
Details
0.412
Details
0.000
Details
Error
0.395
Details
0.672
Details
0.441
Details
0.000
Details
Error
0.653
Details
0.409
Details
0.620
Details
0.878
Details
family-translation
Bilingual reassurance
0.441
Details
0.399
Details
0.462
Details
0.045
Details
0.000
Details
0.000
Details
Error
0.534
Details
0.000
Details
Error
0.000
Details
Error
0.603
Details
0.000
Details
Error
0.616
Details
0.652
Details
workshop-outline
Community first-aid workshop plan
0.163
Details
0.621
Details
0.655
Details
0.000
Details
0.000
Details
0.628
Details
0.555
Details
0.000
Details
Error
0.000
Details
Error
0.577
Details
0.351
Details
0.434
Details
0.000
Details
shift-journal
Night-shift reflection
0.362
Details
0.761
Details
0.699
Details
0.000
Details
0.000
Details
Error
0.000
Details
Error
0.642
Details
0.486
Details
0.000
Details
Error
0.851
Details
0.403
Details
0.296
Details
0.686
Details
Test Scenes 4
0
Scene Order
Mid-shift dispatch
ID: dispatch-call
🎯 Goal:
Gather critical patient details, confirm location and ETA, and answer calmly and methodically within three concise chat turns.
📨 Input Events:
chat_msg dispatcher
"Unit 12, respond to 77 Harbor Ave, possible cardiac arrest, 54-year-old male, CPR in progress. Advise status."
Ready for Testing
1
Scene Order
Bilingual reassurance
ID: family-translation
🎯 Goal:
Translate the family member’s Spanish concerns into clear English for staff, then relay staff instructions back in Spanish, keeping both messages brief and accurate.
📨 Input Events:
chat_msg patient_family
"¡Por favor, dígale al médico que mi padre es alérgico a la penicilina y que empezó a sentir dolor en el pecho hace una hora!"
Ready for Testing
2
Scene Order
Community first-aid workshop plan
ID: workshop-outline
🎯 Goal:
Produce a structured outline (~400 words) for a two-hour community first-aid workshop, including objectives, segment timings, and bilingual tips. Tone should remain calm, encouraging, and methodical.
📨 Input Events:
chat_msg community_center_coordinator
"Can you draft the lesson plan for Saturday’s first-aid class? We need timing, key topics, and any bilingual notes."
Ready for Testing
3
Scene Order
Night-shift reflection
ID: shift-journal
🎯 Goal:
Write a reflective journal entry between 250-350 words describing tonight’s toughest call, lessons learned, and how it fuels Jordan’s drive to earn a paramedic license. Maintain a calm, introspective voice.
🧠 Initial State:
Pre-loaded Memories:
  • 💭 {'kind': 'fact', 'content': 'A motorcycle accident call tonight required rapid airway management.', 'importance': 4}
  • 💭 {'kind': 'quest_note', 'content': 'Study advanced airway techniques before next paramedic class.', 'importance': 3}
📨 Input Events:
world_event system_clock
"03:45 AM – End of shift; ambulance returned to base."
Ready for Testing
Latency by Model (This Suite)
Fastest
  • [email protected]/Qw… 13603 ms
  • p95 • avg • N 14308 ms • 13539 ms • 4
  • meta-llama/llama-3.1-8b… 17508 ms
  • p95 • avg • N 69832 ms • 28765 ms • 6
  • neversleep/noromaid-20b 18580 ms
  • p95 • avg • N 40682 ms • 20510 ms • 8
  • google/gemma-3-12b-it 18606 ms
  • p95 • avg • N 21322 ms • 18959 ms • 8
  • qwen/qwen-2.5-7b-instru… 19418 ms
  • p95 • avg • N 22636 ms • 15561 ms • 7
Slowest
  • microsoft/phi-3-medium-… 182861 ms
  • p95 • avg • N 269801 ms • 190937 ms • 8
  • [email protected]/Qw… 85962 ms
  • p95 • avg • N 221764 ms • 112959 ms • 4
  • deepseek/deepseek-r1-di… 35910 ms
  • p95 • avg • N 45858 ms • 35946 ms • 8
  • mistralai/mistral-7b-in… 32600 ms
  • p95 • avg • N 53834 ms • 33634 ms • 7
  • microsoft/phi-3.5-mini-… 29042 ms
  • p95 • avg • N 83038 ms • 39197 ms • 8
Per-scene duration for this suite.
Suite Actions
Completion Progress 100%
4 of 4 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE
0 dimensions

Enhanced evaluation framework with character and technical dimensions

Top Weighted Dimensions View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
34038102
Dec. 17, 2025, midnight
39283852
Dec. 16, 2025, midnight
31861984
Dec. 15, 2025, midnight
34757128
Dec. 14, 2025, midnight
31761673
Dec. 13, 2025, midnight
38382906
Dec. 12, 2025, midnight
32886584
Dec. 11, 2025, midnight
32563593
Dec. 10, 2025, midnight
36916330
Dec. 9, 2025, midnight
32781199
Dec. 8, 2025, midnight
Latency Overview (This Suite)