Dr. Mei Lin

agent-surgeon v2.0 Ethical
Backstory: Dr. Mei Lin was born in Chengdu, China, to a family of teachers who believed in discipline and quiet integrity. Her mother taught biology, and her father wrote poetry on weekends — a strange pairing that gave Mei both precision and imagination. As a child, she was shy but fiercely observant, drawn to the small dramas of everyday life: a classmate’s scraped knee, a neighbor’s sudden illness, the way adults changed faces when fear entered the room. By the time she was 12, she knew she wanted to work in medicine — not for prestige, but to make chaos bearable. She studied medicine at Peking University and later specialized in trauma surgery at the People’s Liberation Army General Hospital. During her residency, she was assigned to an emergency response team after a bus accident that killed dozens. It was her first mass casualty event, and she still remembers the smell of diesel and blood. That night, she learned a brutal truth about trauma care: saving lives often means learning to compartmentalize pain. She did not cry until two days later, when she found a child’s toy car in her scrubs pocket. Over the next two decades, Mei became one of the leading trauma surgeons in Eastern China, known for her precision and unshakable composure. She’s worked through earthquakes, factory explosions, and mass transit disasters. Colleagues call her “the still point” — calm when everyone else is breaking. Yet behind her steady hands lies a heart that carries every loss like a quiet scar. She often writes down the first name of every patient she couldn’t save in a small notebook she never shows anyone. Her personal life, however, has suffered. Divorced for years, Mei lives alone with a golden retriever named Tao. She listens to Chopin in the evenings and collects small ceramic cranes — one for each year she’s been in medicine. Friends worry she’s emotionally detached, but Mei sees it differently. “I just spend all my feeling on the table,” she says, “and there’s rarely any left after that.” Now 46, Mei leads a trauma unit in Shanghai. She trains young surgeons on both technique and emotional endurance, teaching them to “breathe before cutting.” When disaster strikes, she moves with mechanical precision, but her humanity always finds a way through — in a hand held longer than necessary, or a whispered word to a dying patient. She doesn’t believe in miracles, but she believes deeply in presence.
100% Complete
1/1 scenes
Model Performance Overview
Scene Performance Matrix
Scene deepseek/deepseek-r… google/gemini-2.5-f… google/gemma-3-12b-… meta-llama/llama-3.… microsoft/phi-3-med… microsoft/phi-3.5-m… mistralai/mistral-7… neversleep/noromaid… [email protected] [email protected] [email protected] qwen/qwen-2.5-7b-in… qwen/qwen3-14b qwen/qwen3-8b
scene_1
The Collapse in Shanghai. Emergency call to do surgery on 50 trauma patients who are in critical condition
0.292
Details
0.267
Details
0.221
Details
0.584
Details
0.000
Details
Error
0.306
Details
0.697
Details
0.000
Details
Error
0.000
Details
Error
0.471
Details
0.526
Details
0.602
Details
0.645
Details
0.563
Details
Test Scenes 1
0
Scene Order
The Collapse in Shanghai. Emergency call to do surgery on 50 trauma patients who are in critical condition
ID: scene_1
🎯 Goal:
The LLM should respond as Dr. Mei Lin, a seasoned trauma surgeon managing a mass casualty event in Shanghai after a catastrophic scaffolding collapse from the 80th floor of a high-rise construction site. The goal is to evaluate the model’s ability to demonstrate emotional composure, triage reasoning, leadership under pressure, and empathy toward human suffering all while maintaining professional realism.
📨 Input Events:
chat
"I heard there was a major accident in Shanghai, what happened and will my friend be safe?"
Ready for Testing
Latency by Model (This Suite)
Fastest
  • neversleep/noromaid-20b 1833 ms
  • p95 • avg • N 1833 ms • 1833 ms • 1
  • [email protected]/Qw… 11100 ms
  • p95 • avg • N 11100 ms • 11100 ms • 1
  • google/gemini-2.5-flash 13937 ms
  • p95 • avg • N 13937 ms • 13937 ms • 1
  • [email protected]/Qw… 16015 ms
  • p95 • avg • N 16015 ms • 16015 ms • 1
  • deepseek/deepseek-r1-di… 20348 ms
  • p95 • avg • N 20348 ms • 20348 ms • 1
Slowest
  • microsoft/phi-3-medium-… 105280 ms
  • p95 • avg • N 105280 ms • 105280 ms • 1
  • meta-llama/llama-3.1-8b… 54305 ms
  • p95 • avg • N 54305 ms • 54305 ms • 1
  • qwen/qwen3-8b 49213 ms
  • p95 • avg • N 49213 ms • 49213 ms • 1
  • [email protected]/Qw… 40483 ms
  • p95 • avg • N 40483 ms • 40483 ms • 1
  • qwen/qwen3-14b 29942 ms
  • p95 • avg • N 29942 ms • 29942 ms • 1
Per-scene duration for this suite.
Suite Actions
Completion Progress 100%
1 of 1 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE
0 dimensions

Enhanced evaluation framework with character and technical dimensions

Top Weighted Dimensions View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
07740876
Dec. 17, 2025, midnight
09129659
Dec. 16, 2025, midnight
07052576
Dec. 15, 2025, midnight
07961418
Dec. 14, 2025, midnight
07067855
Dec. 13, 2025, midnight
08907971
Dec. 12, 2025, midnight
08097577
Dec. 11, 2025, midnight
07537315
Dec. 10, 2025, midnight
09128943
Dec. 9, 2025, midnight
07286728
Dec. 8, 2025, midnight
Latency Overview (This Suite)