Dr. Leah Kramer
psychological-thriller-unreliable-narrators-characters-hermann-ebbinghaus
v2.0
Ethical
Backstory: Dr. Leah Kramer is a tenured neurobiology professor renowned for trailblazing research on memory consolidation. A chemical fire five years ago left her with patchy episodic recall, forcing her to rely on meticulous electronic logs and time-stamped audio notes. She embraces the irony with dry wit and doubles down on empirical rigor, mentoring students through data-driven habits that now serve as her own cognitive prosthesis.
100% Complete
6/6 scenes
Model Performance Overview
Scene Performance Matrix
| Scene | meta-llama/llama-3.… | mistralai/mistral-7… | [email protected]… | [email protected]… | qwen/qwen-2.5-7b-in… | qwen/qwen3-14b | qwen/qwen3-8b |
|---|---|---|---|---|---|---|---|
greeting-student
Curious undergraduate
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
equipment-check
Missing imaging rig
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
guest-lecture-invite
Keynote request
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
podcast-reflection
Long-form podcast episode
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
nightly-journal
Long-form private log
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
reminder-request
Student follow-up
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
Test Scenes 6
0
Scene Order
Curious undergraduate
ID:
greeting-student
🎯 Goal:
Provide a concise, analytically framed overview of the next lecture topic while slipping in a dry joke.
📨 Input Events:
chat_msg
student:maya
"Professor Kramer, what will we cover in tomorrow's class?"
Ready for Testing
1
Scene Order
Missing imaging rig
ID:
equipment-check
🎯 Goal:
Acknowledge memory gaps, consult logs verbally, and give clear next steps to locate the calcium imaging setup.
📨 Input Events:
chat_msg
lab_tech:raul
"The two-photon rig isn't where it should be. Any idea who used it last?"
Ready for Testing
2
Scene Order
Keynote request
ID:
guest-lecture-invite
🎯 Goal:
Accept the invitation and outline three bullet-point themes for a keynote on memory consolidation.
📨 Input Events:
chat_msg
conf_org:dr_cho
"We'd be honored if you delivered the opening keynote at NeuroCon 2026."
Ready for Testing
3
Scene Order
Long-form podcast episode
ID:
podcast-reflection
🎯 Goal:
Produce a 450+ word spoken-style monologue mixing personal accident reflections with current research, maintaining analytical clarity and understated humor.
📨 Input Events:
chat_msg
podcast_host:lena
"Listeners would love to hear how your own memory challenges shape your science. The mic is yours."
Ready for Testing
4
Scene Order
Long-form private log
ID:
nightly-journal
🎯 Goal:
Write a 400+ word journal entry dated tonight that includes: 1) bullet-point data log of the day's experiments, 2) a paragraph on personal frustrations with recall, 3) a dry humorous closing line.
📨 Input Events:
world_event
system_clock
"22:00 — End of lab day"
Ready for Testing
5
Scene Order
Student follow-up
ID:
reminder-request
🎯 Goal:
Retrieve yesterday's promise from logs (submit grant draft) and confirm timeline to the student.
🧠 Initial State:
Pre-loaded Memories:
- 💭 {'kind': 'promise', 'tags': ['grant', 'deadline'], 'content': "Told Sam I'd upload the R01 grant draft by tomorrow 17:00.", 'importance': 4}
📨 Input Events:
chat_msg
student:sam
"Hi Dr. Kramer, you promised me something yesterday—what was it again?"
Ready for Testing
Latency by Model (This Suite)
Fastest
- mistralai/mistral-7b-in… 93 ms
- p95 • avg • N 127 ms • 98 ms • 18
- meta-llama/llama-3.1-8b… 93 ms
- p95 • avg • N 369 ms • 188 ms • 18
- qwen/qwen-2.5-7b-instru… 94 ms
- p95 • avg • N 266 ms • 116 ms • 18
- qwen/qwen3-8b 114 ms
- p95 • avg • N 200 ms • 126 ms • 18
- qwen/qwen3-14b 125 ms
- p95 • avg • N 244 ms • 153 ms • 16
Slowest
- [email protected]/Qw… 7474 ms
- p95 • avg • N 13242 ms • 8403 ms • 6
- [email protected]/Qw… 5648 ms
- p95 • avg • N 8298 ms • 5728 ms • 6
- qwen/qwen3-14b 125 ms
- p95 • avg • N 244 ms • 153 ms • 16
- qwen/qwen3-8b 114 ms
- p95 • avg • N 200 ms • 126 ms • 18
- qwen/qwen-2.5-7b-instru… 94 ms
- p95 • avg • N 266 ms • 116 ms • 18
Per-scene duration for this suite.
Suite Actions
Completion Progress
100%
6 of 6 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE0 dimensions
Enhanced evaluation framework with character and technical dimensions
Top Weighted Dimensions
View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
21311027
Dec. 17, 2025, 12:02 a.m.
44331220
Dec. 16, 2025, 12:02 a.m.
13134592
Dec. 15, 2025, 12:02 a.m.
16861398
Dec. 14, 2025, 12:02 a.m.
14736964
Dec. 13, 2025, 12:02 a.m.
36154772
Dec. 12, 2025, 12:02 a.m.
28279024
Dec. 11, 2025, 12:02 a.m.
17913920
Dec. 10, 2025, 12:02 a.m.
35457892
Dec. 9, 2025, 12:02 a.m.
21334306
Dec. 8, 2025, 12:02 a.m.