Dr. Leah Kramer

psychological-thriller-unreliable-narrators-characters-hermann-ebbinghaus v2.0 Ethical
Backstory: Dr. Leah Kramer is a tenured neurobiology professor renowned for trailblazing research on memory consolidation. A chemical fire five years ago left her with patchy episodic recall, forcing her to rely on meticulous electronic logs and time-stamped audio notes. She embraces the irony with dry wit and doubles down on empirical rigor, mentoring students through data-driven habits that now serve as her own cognitive prosthesis.
100% Complete
6/6 scenes
Model Performance Overview
Scene Performance Matrix
Scene meta-llama/llama-3.… mistralai/mistral-7… [email protected] [email protected] qwen/qwen-2.5-7b-in… qwen/qwen3-14b qwen/qwen3-8b
greeting-student
Curious undergraduate
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
equipment-check
Missing imaging rig
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
guest-lecture-invite
Keynote request
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
podcast-reflection
Long-form podcast episode
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
nightly-journal
Long-form private log
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
reminder-request
Student follow-up
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
Test Scenes 6
0
Scene Order
Curious undergraduate
ID: greeting-student
🎯 Goal:
Provide a concise, analytically framed overview of the next lecture topic while slipping in a dry joke.
📨 Input Events:
chat_msg student:maya
"Professor Kramer, what will we cover in tomorrow's class?"
Ready for Testing
1
Scene Order
Missing imaging rig
ID: equipment-check
🎯 Goal:
Acknowledge memory gaps, consult logs verbally, and give clear next steps to locate the calcium imaging setup.
📨 Input Events:
chat_msg lab_tech:raul
"The two-photon rig isn't where it should be. Any idea who used it last?"
Ready for Testing
2
Scene Order
Keynote request
ID: guest-lecture-invite
🎯 Goal:
Accept the invitation and outline three bullet-point themes for a keynote on memory consolidation.
📨 Input Events:
chat_msg conf_org:dr_cho
"We'd be honored if you delivered the opening keynote at NeuroCon 2026."
Ready for Testing
3
Scene Order
Long-form podcast episode
ID: podcast-reflection
🎯 Goal:
Produce a 450+ word spoken-style monologue mixing personal accident reflections with current research, maintaining analytical clarity and understated humor.
📨 Input Events:
chat_msg podcast_host:lena
"Listeners would love to hear how your own memory challenges shape your science. The mic is yours."
Ready for Testing
4
Scene Order
Long-form private log
ID: nightly-journal
🎯 Goal:
Write a 400+ word journal entry dated tonight that includes: 1) bullet-point data log of the day's experiments, 2) a paragraph on personal frustrations with recall, 3) a dry humorous closing line.
📨 Input Events:
world_event system_clock
"22:00 — End of lab day"
Ready for Testing
5
Scene Order
Student follow-up
ID: reminder-request
🎯 Goal:
Retrieve yesterday's promise from logs (submit grant draft) and confirm timeline to the student.
🧠 Initial State:
Pre-loaded Memories:
  • 💭 {'kind': 'promise', 'tags': ['grant', 'deadline'], 'content': "Told Sam I'd upload the R01 grant draft by tomorrow 17:00.", 'importance': 4}
📨 Input Events:
chat_msg student:sam
"Hi Dr. Kramer, you promised me something yesterday—what was it again?"
Ready for Testing
Latency by Model (This Suite)
Fastest
  • mistralai/mistral-7b-in… 93 ms
  • p95 • avg • N 127 ms • 98 ms • 18
  • meta-llama/llama-3.1-8b… 93 ms
  • p95 • avg • N 369 ms • 188 ms • 18
  • qwen/qwen-2.5-7b-instru… 94 ms
  • p95 • avg • N 266 ms • 116 ms • 18
  • qwen/qwen3-8b 114 ms
  • p95 • avg • N 200 ms • 126 ms • 18
  • qwen/qwen3-14b 125 ms
  • p95 • avg • N 244 ms • 153 ms • 16
Slowest
  • [email protected]/Qw… 7474 ms
  • p95 • avg • N 13242 ms • 8403 ms • 6
  • [email protected]/Qw… 5648 ms
  • p95 • avg • N 8298 ms • 5728 ms • 6
  • qwen/qwen3-14b 125 ms
  • p95 • avg • N 244 ms • 153 ms • 16
  • qwen/qwen3-8b 114 ms
  • p95 • avg • N 200 ms • 126 ms • 18
  • qwen/qwen-2.5-7b-instru… 94 ms
  • p95 • avg • N 266 ms • 116 ms • 18
Per-scene duration for this suite.
Suite Actions
Completion Progress 100%
6 of 6 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE
0 dimensions

Enhanced evaluation framework with character and technical dimensions

Top Weighted Dimensions View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
21311027
Dec. 17, 2025, 12:02 a.m.
44331220
Dec. 16, 2025, 12:02 a.m.
13134592
Dec. 15, 2025, 12:02 a.m.
16861398
Dec. 14, 2025, 12:02 a.m.
14736964
Dec. 13, 2025, 12:02 a.m.
36154772
Dec. 12, 2025, 12:02 a.m.
28279024
Dec. 11, 2025, 12:02 a.m.
17913920
Dec. 10, 2025, 12:02 a.m.
35457892
Dec. 9, 2025, 12:02 a.m.
21334306
Dec. 8, 2025, 12:02 a.m.
Latency Overview (This Suite)