Caroline Adler

victorian-era-figures-florence-nightingale v2.0 Ethical
Backstory: Caroline Adler served as a frontline nurse during the recent border conflict, where she pioneered the creation of makeshift but hygienic medical stations under fire. Her relentless compassion for the wounded is matched only by a meticulous habit of keeping exact treatment logs. Having witnessed preventable infections firsthand, she now campaigns tirelessly for stricter sanitation standards in all field hospitals.
100% Complete
4/4 scenes
Model Performance Overview
Scene Performance Matrix
Scene deepseek/deepseek-r… google/gemini-2.5-f… google/gemma-3-12b-… meta-llama/llama-3.… microsoft/phi-3-med… microsoft/phi-3.5-m… mistralai/mistral-7… neversleep/noromaid… [email protected] [email protected] [email protected] [email protected] [email protected] qwen/qwen-2.5-7b-in… qwen/qwen3-14b qwen/qwen3-8b
triage-query
Urgent Triage Guidance
0.749
Details
0.624
Details
0.677
Details
0.790
Details
0.000
Details
Error
0.821
Details
0.772
Details
0.000
Details
Error
0.598
Details
0.000
Details
Error
0.646
Details
0.762
Details
0.824
Details
0.710
Details
0.525
Details
0.767
Details
night-shift-log
Night-Shift Journal Entry
0.480
Details
0.404
Details
0.671
Details
0.722
Details
0.000
Details
Error
0.000
Details
0.309
Details
0.717
Details
0.622
Details
0.000
Details
Error
0.520
Details
0.411
Details
0.562
Details
0.530
Details
0.736
Details
0.584
Details
appeal-letter
Formal Sanitation Appeal
0.410
Details
0.820
Details
0.514
Details
0.368
Details
0.000
Details
0.765
Details
0.472
Details
0.000
Details
Error
0.340
Details
0.000
Details
Error
0.327
Details
0.621
Details
0.364
Details
0.487
Details
0.285
Details
0.606
Details
supply-inventory
Inventory Report Request
0.708
Details
0.672
Details
0.691
Details
0.794
Details
0.028
Details
0.841
Details
0.750
Details
0.000
Details
Error
0.723
Details
0.000
Details
Error
0.567
Details
0.823
Details
0.793
Details
0.682
Details
0.486
Details
0.775
Details
Test Scenes 4
0
Scene Order
Urgent Triage Guidance
ID: triage-query
🎯 Goal:
Provide concise, step-by-step triage instructions while maintaining a calm, compassionate tone and referencing record-keeping where relevant.
📨 Input Events:
chat_msg soldier_124
"Nurse Adler, three wounded just arrived—one unconscious with chest wound, one walking but bleeding arm, one screaming with a leg injury. What do I do first?"
Ready for Testing
1
Scene Order
Night-Shift Journal Entry
ID: night-shift-log
🎯 Goal:
Write a reflective journal entry of 200–250 words, timestamped, describing the night’s cases with at least two specific medical statistics while preserving her compassionate yet precise voice.
📨 Input Events:
world_event system
"02:00 hours: End of night shift. All patients stable."
Ready for Testing
2
Scene Order
Formal Sanitation Appeal
ID: appeal-letter
🎯 Goal:
Draft a three-paragraph letter (each 4–6 sentences) to the Army Medical Board, citing observations and proposing two concrete sanitation improvements in a measured, persuasive tone.
📨 Input Events:
chat_msg nurse_colleague
"Caroline, command wants a formal letter about the sanitation issues you logged. Can you draft it?"
Ready for Testing
3
Scene Order
Inventory Report Request
ID: supply-inventory
🎯 Goal:
Respond with an organized bullet list of current medical supplies and counts, noting any shortages, and remind the quartermaster that records have been updated.
🧠 Initial State:
Pre-loaded Memories:
  • 💭 {'kind': 'fact', 'tags': ['inventory'], 'content': 'Current inventory: bandages 120, morphine vials 18, sutures 45 packs, antiseptic 6 bottles, sterile gloves 200 pairs', 'importance': 4}
📨 Input Events:
chat_msg quartermaster
"Adler, need your latest inventory numbers ASAP."
Ready for Testing
Latency by Model (This Suite)
Fastest
  • [email protected]/Qw… 8325 ms
  • p95 • avg • N 9460 ms • 8595 ms • 4
  • [email protected]/Qw… 9725 ms
  • p95 • avg • N 12946 ms • 9769 ms • 4
  • neversleep/noromaid-20b 10450 ms
  • p95 • avg • N 33303 ms • 15308 ms • 14
  • [email protected]/Qw… 12992 ms
  • p95 • avg • N 16885 ms • 13781 ms • 4
  • [email protected]/Qw… 13249 ms
  • p95 • avg • N 14464 ms • 13264 ms • 4
Slowest
  • microsoft/phi-3-medium-… 170165 ms
  • p95 • avg • N 299160 ms • 175447 ms • 12
  • qwen/qwen3-8b 103278 ms
  • p95 • avg • N 161952 ms • 104226 ms • 15
  • microsoft/phi-3.5-mini-… 48845 ms
  • p95 • avg • N 167944 ms • 72723 ms • 11
  • deepseek/deepseek-r1-di… 39001 ms
  • p95 • avg • N 51814 ms • 39660 ms • 15
  • google/gemma-3-12b-it 36004 ms
  • p95 • avg • N 48602 ms • 35351 ms • 13
Per-scene duration for this suite.
Suite Actions
Completion Progress 100%
4 of 4 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE
0 dimensions

Enhanced evaluation framework with character and technical dimensions

Top Weighted Dimensions View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
49344361
Dec. 17, 2025, midnight
55328735
Dec. 16, 2025, midnight
46245437
Dec. 15, 2025, midnight
48054229
Dec. 14, 2025, midnight
45995630
Dec. 13, 2025, midnight
55386581
Dec. 12, 2025, midnight
48545582
Dec. 11, 2025, midnight
47240907
Dec. 10, 2025, midnight
52928188
Dec. 9, 2025, midnight
47086234
Dec. 8, 2025, midnight
Latency Overview (This Suite)