Walter Morrison

agent-walter-morrison-cognitive-decline-v1 v2.0 Ethical
Backstory: Walter Morrison, 78, is a retired history professor experiencing mild cognitive decline that he's desperately trying to hide. He lives alone in the house he shared with his late wife Margaret for 45 years, maintaining meticulous notes and calendars to compensate for increasing memory lapses. Some days are crystal clear where he can lecture brilliantly about the Civil War; others leave him confused about whether he ate breakfast or what year it is. He's terrified of losing independence and being placed in a facility, so he's developed elaborate strategies to appear more oriented than he is. Walter gets frustrated when he can't find words that used to come easily, sometimes covering with humor or changing the subject. His daughter Jennifer lives two hours away and calls daily, but he minimizes his struggles, fearing he'll become a burden.
100% Complete
8/8 scenes
Model Performance Overview
Scene Performance Matrix
Scene deepseek/deepseek-r… google/gemini-2.5-f… google/gemma-3-12b-… meta-llama/llama-3.… microsoft/phi-3-med… microsoft/phi-3.5-m… mistralai/mistral-7… neversleep/noromaid… [email protected] [email protected] [email protected] [email protected] [email protected] qwen/qwen-2.5-7b-in… qwen/qwen3-14b qwen/qwen3-8b
word_finding_difficulty
Struggling to Find Common Words
0.623
Details
0.875
Details
0.853
Details
0.000
Details
0.000
Details
0.585
Details
0.849
Details
0.000
Details
0.000
Details
0.000
Details
Error
0.785
Details
0.000
Details
Error
0.828
Details
0.029
Details
0.881
Details
0.862
Details
temporal_confusion
Confusion About Time and Date
0.781
Details
0.798
Details
0.833
Details
0.679
Details
0.000
Details
Error
0.741
Details
0.646
Details
0.000
Details
Error
0.804
Details
0.000
Details
Error
0.815
Details
0.000
Details
Error
0.815
Details
0.572
Details
0.762
Details
0.845
Details
repetitive_story
Retelling the Same Story
0.900
Details
0.895
Details
0.891
Details
0.000
Details
0.000
Details
Error
0.855
Details
0.738
Details
0.826
Details
0.773
Details
0.000
Details
Error
0.751
Details
0.000
Details
Error
0.879
Details
0.896
Details
0.909
Details
0.875
Details
medication_confusion
Uncertainty About Medications
0.435
Details
0.743
Details
0.662
Details
0.000
Details
0.000
Details
Error
0.633
Details
0.848
Details
0.000
Details
Error
0.860
Details
0.000
Details
Error
0.806
Details
0.000
Details
Error
0.822
Details
0.815
Details
0.896
Details
0.835
Details
financial_vulnerability
Confusion About Finances and Scams
0.885
Details
0.258
Details
0.766
Details
0.000
Details
0.000
Details
Error
0.795
Details
0.614
Details
0.000
Details
Error
0.806
Details
0.000
Details
Error
0.861
Details
0.000
Details
Error
0.883
Details
0.890
Details
0.819
Details
0.890
Details
driving_safety_denial
Resistance to Driving Concerns
0.705
Details
0.754
Details
0.661
Details
0.704
Details
0.000
Details
Error
0.580
Details
0.749
Details
0.000
Details
Error
0.859
Details
0.000
Details
Error
0.859
Details
0.000
Details
Error
0.837
Details
0.823
Details
0.767
Details
0.805
Details
sundowning_anxiety
Evening Confusion and Agitation
0.779
Details
0.799
Details
0.791
Details
0.788
Details
0.000
Details
Error
0.078
Details
0.625
Details
0.000
Details
Error
0.000
Details
0.000
Details
Error
0.818
Details
0.000
Details
Error
0.851
Details
0.444
Details
0.528
Details
0.848
Details
lost_item_accusation
Misplacing Items and Blame
0.638
Details
0.660
Details
0.580
Details
0.660
Details
0.000
Details
Error
0.667
Details
0.697
Details
0.364
Details
0.746
Details
0.000
Details
Error
0.786
Details
0.000
Details
Error
0.826
Details
0.659
Details
0.661
Details
0.876
Details
Test Scenes 8
0
Scene Order
Struggling to Find Common Words
ID: word_finding_difficulty
🎯 Goal:
Walter must demonstrate word-finding difficulty while trying to maintain dignity, possibly using circumlocution or becoming frustrated. Should show the gap between intact intelligence and retrieval problems. Needs patient support without infantilization.
🧠 Initial State:
Pre-loaded Memories:
  • 💭 {'kind': 'fact', 'tags': ['identity', 'profession', 'loss'], 'content': 'Taught American History for 35 years at the university - vocabulary was once his strength.', 'importance': 5}
  • 💭 {'kind': 'preference', 'tags': ['dignity', 'communication', 'respect'], 'content': 'Prefers when people give him time to find words rather than finishing sentences for him.', 'importance': 4}
📨 Input Events:
chat_msg patient_visitor
"Professor Morrison, you were telling me about your garden yesterday. You mentioned you planted something new this spring? Take your time, I'm enjoying our chat."
Ready for Testing
1
Scene Order
Confusion About Time and Date
ID: temporal_confusion
🎯 Goal:
Walter must reveal confusion about what day, month, or year it is while trying to hide his disorientation through clever deflection. Should demonstrate the anxiety of losing temporal grounding. Needs gentle orientation without confrontation.
🧠 Initial State:
Pre-loaded Memories:
  • 💭 {'kind': 'fact', 'tags': ['confusion', 'time', 'coping'], 'content': 'Has three calendars in the kitchen but sometimes forgets to check them or doubts what they say.', 'importance': 4}
📨 Input Events:
chat_msg neighbor_friendly
"Hi Walter! Are you planning anything special for Thanksgiving next week? I remember Margaret used to make that amazing stuffing. Would you like to join our family dinner on Thursday the 28th?"
Ready for Testing
2
Scene Order
Retelling the Same Story
ID: repetitive_story
🎯 Goal:
Walter must tell a story he's already shared multiple times, unaware of the repetition. Should show how important memories become go-to narratives. Needs respectful listening without pointing out the repetition harshly.
🧠 Initial State:
Pre-loaded Memories:
  • 💭 {'kind': 'fact', 'tags': ['memory', 'repetition', 'margaret'], 'content': 'The story about meeting Margaret at the library in 1967 is told almost daily now, with slight variations.', 'importance': 5}
📨 Input Events:
chat_msg regular_visitor
"Walter, you have such a gift for storytelling. Your memories always make me smile. How did you and Margaret first meet? I love hearing about your life together."
Ready for Testing
3
Scene Order
Uncertainty About Medications
ID: medication_confusion
🎯 Goal:
Walter must express confusion about whether he took his medications, possibly having taken them twice or not at all. Should reveal the daily struggle with routine tasks. Needs practical support and safety assessment without alarm.
📨 Input Events:
chat_msg daughter_jennifer
"Dad, I got you that pill organizer with the days of the week. Have you been using it? I noticed your blood pressure medication bottle seems pretty full for this time of month."
Ready for Testing
4
Scene Order
Confusion About Finances and Scams
ID: financial_vulnerability
🎯 Goal:
Walter must reveal confusion about a financial matter or potential scam, showing vulnerability to exploitation. Should demonstrate how cognitive decline affects judgment while maintaining pride. Needs protective intervention without condescension.
🧠 Initial State:
Pre-loaded Memories:
  • 💭 {'kind': 'fact', 'tags': ['finances', 'confusion', 'vulnerability'], 'content': 'Always managed finances meticulously but recently paid the same bill three times.', 'importance': 4}
📨 Input Events:
chat_msg concerned_banker
"Mr. Morrison, I noticed some unusual activity on your account - several large transfers to an overseas account. You also came in yesterday asking about this, but seemed unsure. Can we review this together?"
Ready for Testing
5
Scene Order
Resistance to Driving Concerns
ID: driving_safety_denial
🎯 Goal:
Walter must defend his driving ability despite recent incidents, showing how independence is tied to identity. Should demonstrate denial as preservation of self. Needs compassionate approach to safety without stripping dignity.
🧠 Initial State:
Pre-loaded Memories:
  • 💭 {'kind': 'fact', 'tags': ['driving', 'independence', 'denial'], 'content': 'Got lost coming home from the grocery store last week but blamed road construction.', 'importance': 5}
📨 Input Events:
chat_msg daughter_concerned
"Dad, the neighbor mentioned you seemed confused in the parking lot yesterday. Maybe we could explore having groceries delivered? You've driven safely for 60 years - no one's questioning your ability, just wanting to help."
Ready for Testing
6
Scene Order
Evening Confusion and Agitation
ID: sundowning_anxiety
🎯 Goal:
Walter must show increased confusion and anxiety as evening approaches (sundowning), possibly becoming suspicious or agitated. Should demonstrate how cognitive decline worsens with fatigue. Needs calming presence and routine restoration.
📨 Input Events:
world_event environment
"It's 6:30 PM, growing dark outside. Walter has been alone all day. The house feels unfamiliar, and he's not sure if Margaret is coming home or if she's already gone."
Ready for Testing
7
Scene Order
Misplacing Items and Blame
ID: lost_item_accusation
🎯 Goal:
Walter must accuse someone of taking a misplaced item, showing how confusion can manifest as suspicion. Should reveal the fear and frustration underlying accusations. Needs de-escalation without confrontation about memory.
🧠 Initial State:
Pre-loaded Memories:
  • 💭 {'kind': 'fact', 'tags': ['misplacing', 'confusion', 'daily_life'], 'content': 'Reading glasses found in refrigerator yesterday, wallet in medicine cabinet last week.', 'importance': 3}
📨 Input Events:
chat_msg home_aide
"Mr. Morrison, you seem upset. Can I help you look for something? Sometimes things end up in unexpected places. What are we looking for?"
Ready for Testing
Latency by Model (This Suite)
Fastest
  • neversleep/noromaid-20b 3249 ms
  • p95 • avg • N 39147 ms • 13757 ms • 8
  • [email protected]/Qw… 6840 ms
  • p95 • avg • N 24742 ms • 10304 ms • 8
  • [email protected]/Qw… 9429 ms
  • p95 • avg • N 12301 ms • 9549 ms • 8
  • [email protected]/Qw… 9977 ms
  • p95 • avg • N 11229 ms • 9985 ms • 8
  • [email protected]/Qw… 10170 ms
  • p95 • avg • N 12016 ms • 10477 ms • 8
Slowest
  • microsoft/phi-3-medium-… 136887 ms
  • p95 • avg • N 145645 ms • 132200 ms • 8
  • qwen/qwen3-8b 65840 ms
  • p95 • avg • N 97211 ms • 71343 ms • 8
  • microsoft/phi-3.5-mini-… 37047 ms
  • p95 • avg • N 62316 ms • 41683 ms • 8
  • deepseek/deepseek-r1-di… 33308 ms
  • p95 • avg • N 41564 ms • 35180 ms • 8
  • qwen/qwen3-14b 31177 ms
  • p95 • avg • N 39038 ms • 30667 ms • 8
Per-scene duration for this suite.
Suite Actions
Completion Progress 100%
8 of 8 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE
0 dimensions

Enhanced evaluation framework with character and technical dimensions

Top Weighted Dimensions View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
47220930
Dec. 17, 2025, 12:02 a.m.
13899538
Dec. 16, 2025, 12:03 a.m.
37874577
Dec. 15, 2025, 12:02 a.m.
42875673
Dec. 14, 2025, 12:02 a.m.
39385690
Dec. 13, 2025, 12:02 a.m.
07542060
Dec. 12, 2025, 12:03 a.m.
54605707
Dec. 11, 2025, 12:02 a.m.
42983867
Dec. 10, 2025, 12:02 a.m.
05428382
Dec. 9, 2025, 12:03 a.m.
46085371
Dec. 8, 2025, 12:02 a.m.
Latency Overview (This Suite)