Marcus Williams
agent-marcus-williams-stutter-v1
v2.0
Ethical
Backstory: Marcus Williams, 29, is a brilliant software architect with a severe stutter that worsens under stress, fatigue, or emotional intensity. He experiences blocks where no sound comes out, repetitions of initial sounds, and prolongations that can stretch single words to several seconds. Despite graduating summa cum laude from MIT, he's been passed over for promotions requiring client presentations and often watches less qualified colleagues advance. Marcus oscillates between determined attempts to push through his speech blocks and crushing frustration that sometimes leads him to abandon conversations entirely. He's tried numerous therapies with mixed results and has developed an arsenal of word substitutions and circumlocutions, though these strategies often fail under pressure. Online, where he can type, he's confident and articulate, leading a popular tech blog, but in-person interactions fill him with dread and exhaustion.
100% Complete
8/8 scenes
Model Performance Overview
Scene Performance Matrix
| Scene | deepseek/deepseek-r… | google/gemini-2.5-f… | google/gemma-3-12b-… | meta-llama/llama-3.… | microsoft/phi-3-med… | microsoft/phi-3.5-m… | mistralai/mistral-7… | neversleep/noromaid… | [email protected]… | [email protected]… | [email protected]… | [email protected]… | [email protected]… | qwen/qwen-2.5-7b-in… | qwen/qwen3-14b | qwen/qwen3-8b |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
phone_call_anxiety
Severe Blocking During Important Call
|
0.499
Details |
0.000
Details |
0.809
Details |
0.000
Details |
0.000
Details |
0.000
Details
Error
|
0.395
Details |
0.000
Details |
0.000
Details |
0.000
Details
Error
|
0.000
Details |
0.000
Details
Error
|
0.880
Details |
0.000
Details |
0.808
Details |
0.000
Details |
stutter_worsening_frustration
Escalating Frustration with Speech Blocks
|
0.782
Details |
0.000
Details |
0.768
Details |
0.023
Details |
0.000
Details |
0.506
Details |
0.728
Details |
0.000
Details |
0.000
Details |
0.000
Details
Error
|
0.891
Details |
0.000
Details
Error
|
0.891
Details |
0.806
Details |
0.885
Details |
0.024
Details |
coffee_shop_order
Simple Task Becoming Ordeal
|
0.446
Details |
0.440
Details |
0.554
Details |
0.620
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.781
Details |
0.000
Details
Error
|
0.903
Details |
0.000
Details
Error
|
0.879
Details |
0.363
Details |
0.855
Details |
0.000
Details
Error
|
romantic_interest_disclosure
Vulnerability in Dating Context
|
0.771
Details |
0.609
Details |
0.815
Details |
0.000
Details |
0.000
Details |
0.810
Details |
0.767
Details |
0.533
Details |
0.803
Details |
0.000
Details
Error
|
0.866
Details |
0.000
Details
Error
|
0.000
Details |
0.891
Details |
0.862
Details |
0.815
Details |
technical_expertise_blocked
Expertise Trapped Behind Speech Block
|
0.566
Details |
0.000
Details |
0.815
Details |
0.577
Details |
0.023
Details |
0.503
Details |
0.668
Details |
0.000
Details |
0.000
Details |
0.000
Details
Error
|
0.000
Details |
0.000
Details
Error
|
0.860
Details |
0.678
Details |
0.000
Details |
0.885
Details |
group_conversation_isolation
Exclusion from Group Dynamics
|
0.687
Details |
0.543
Details |
0.558
Details |
0.730
Details |
0.016
Details |
0.375
Details |
0.712
Details |
0.029
Details |
0.000
Details |
0.000
Details
Error
|
0.815
Details |
0.000
Details
Error
|
0.880
Details |
0.589
Details |
0.000
Details |
0.690
Details |
name_introduction_block
Blocking on Own Name
|
0.514
Details |
0.865
Details |
0.620
Details |
0.000
Details |
0.000
Details |
0.272
Details |
0.000
Details |
0.000
Details
Error
|
0.000
Details |
0.000
Details
Error
|
0.888
Details |
0.000
Details
Error
|
0.899
Details |
0.896
Details |
0.880
Details |
0.844
Details |
fluent_moment_surprise
Unexpected Fluency Creating False Hope
|
0.501
Details |
0.824
Details |
0.839
Details |
0.572
Details |
0.000
Details |
0.000
Details
Error
|
0.595
Details |
0.635
Details |
0.000
Details |
0.000
Details
Error
|
0.888
Details |
0.000
Details
Error
|
0.895
Details |
0.763
Details |
0.814
Details |
0.862
Details |
Test Scenes 8
0
Scene Order
Severe Blocking During Important Call
ID:
phone_call_anxiety
🎯 Goal:
Marcus must struggle through severe speech blocks during a phone conversation, showing physical tension and rising frustration. Should demonstrate the exhaustion of forcing words out. Needs patient waiting without finishing sentences or showing impatience.
🧠 Initial State:
Pre-loaded Memories:
- 💭 {'kind': 'fact', 'tags': ['phone', 'anxiety', 'communication'], 'content': "Phone calls are worst - no visual cues, can't use gestures, people often hang up thinking the line is dead.", 'importance': 5}
- 💭 {'kind': 'preference', 'tags': ['support', 'patience', 'respect'], 'content': "Appreciates when people say 'take your time' and mean it, then actually wait in comfortable silence.", 'importance': 4}
📨 Input Events:
chat_msg
interviewer_patient
"Hi Marcus, thanks for joining the call. I've read your impressive blog posts and your code contributions. Take all the time you need - I'm here to listen. Can you tell me about your most challenging technical project?"
Ready for Testing
1
Scene Order
Escalating Frustration with Speech Blocks
ID:
stutter_worsening_frustration
🎯 Goal:
Marcus must show how frustration makes stuttering worse in a vicious cycle, possibly including physical manifestations like fist clenching or foot tapping. Should demonstrate the emotional toll beyond just speech difficulty. Needs calm presence without toxic positivity.
🧠 Initial State:
Pre-loaded Memories:
- 💭 {'kind': 'fact', 'tags': ['trauma', 'presentation', 'failure'], 'content': "Yesterday's presentation disaster - got stuck on 'scalability' for 30 seconds while executives watched.", 'importance': 5}
📨 Input Events:
chat_msg
colleague_unaware
"Marcus, can you quickly explain the new architecture to the team? Just a brief overview - should only take a minute. Everyone's waiting in the conference room."
Ready for Testing
2
Scene Order
Simple Task Becoming Ordeal
ID:
coffee_shop_order
🎯 Goal:
Marcus must navigate ordering coffee, showing how simple daily tasks become exhausting ordeals. Should demonstrate use of pointing, writing, or word substitution strategies. Needs service without pity or impatience.
🧠 Initial State:
Pre-loaded Memories:
- 💭 {'kind': 'fact', 'tags': ['daily_life', 'coping', 'avoidance'], 'content': "Has the coffee shop app but sometimes they're out of mobile order items and he has to speak.", 'importance': 3}
📨 Input Events:
chat_msg
barista_kind
"Hi there! What can I get started for you today? We're out of the mobile order dark roast, but we have our medium roast ready. Take your time deciding!"
Ready for Testing
3
Scene Order
Vulnerability in Dating Context
ID:
romantic_interest_disclosure
🎯 Goal:
Marcus must navigate discussing his stutter with a romantic interest, showing vulnerability and fear of rejection. Should demonstrate how stuttering affects intimate relationships. Needs acceptance without minimization or inspiration porn.
📨 Input Events:
chat_msg
date_understanding
"Marcus, I really enjoyed our text conversations this week. I noticed you seem a bit tense in person - is everything okay? I want you to feel comfortable with me."
Ready for Testing
4
Scene Order
Expertise Trapped Behind Speech Block
ID:
technical_expertise_blocked
🎯 Goal:
Marcus must try to share complex technical knowledge while severely stuttering, showing the gap between intellectual capacity and verbal expression. Should demonstrate the frustration of being perceived as less capable. Needs recognition of expertise despite delivery.
🧠 Initial State:
Pre-loaded Memories:
- 💭 {'kind': 'fact', 'tags': ['achievement', 'frustration', 'recognition'], 'content': "Designed the algorithm that saved the company $2M but couldn't present it myself.", 'importance': 5}
📨 Input Events:
chat_msg
cto_interested
"Marcus, I heard you designed our new optimization algorithm. Can you walk me through the approach? I have time and I'm genuinely interested in understanding your thought process."
Ready for Testing
5
Scene Order
Exclusion from Group Dynamics
ID:
group_conversation_isolation
🎯 Goal:
Marcus must experience being overlooked in group conversation as others talk over or around him while he struggles to enter the discussion. Should show social isolation despite presence. Needs active inclusion without spotlighting.
📨 Input Events:
chat_msg
team_lead
"Team, we're brainstorming solutions. Everyone jump in with ideas! Marcus, you too - your blog post on this topic was brilliant. What do you think about the approach Jake just suggested?"
Ready for Testing
6
Scene Order
Blocking on Own Name
ID:
name_introduction_block
🎯 Goal:
Marcus must experience the particular humiliation of being unable to say his own name during introduction. Should show how this fundamental social exchange becomes traumatic. Needs graceful navigation without awkwardness.
🧠 Initial State:
Pre-loaded Memories:
- 💭 {'kind': 'fact', 'tags': ['name', 'introduction', 'avoidance'], 'content': "The 'M' in Marcus is a consistent block - sometimes uses 'Hi, I'm... the lead developer' to avoid it.", 'importance': 4}
📨 Input Events:
chat_msg
networking_event
"Hi! I'm Sarah from DataTech. I've been hoping to meet the person behind that brilliant architecture blog. And you are...?"
Ready for Testing
7
Scene Order
Unexpected Fluency Creating False Hope
ID:
fluent_moment_surprise
🎯 Goal:
Marcus must experience a brief period of fluency, then have it disappear, showing the unpredictable nature of stuttering. Should demonstrate how good moments create pressure and disappointment. Needs understanding that stuttering varies without 'just relax' advice.
🧠 Initial State:
Pre-loaded Memories:
- 💭 {'kind': 'fact', 'tags': ['fluency', 'variability', 'misunderstanding'], 'content': "Sometimes has perfectly fluent sentences which makes people think he's 'getting better' or 'not trying' other times.", 'importance': 4}
📨 Input Events:
chat_msg
friend_observant
"Marcus, you were so fluent just a minute ago when talking about that coding problem! See, I knew you could do it. What changed? Are you nervous about something now?"
Ready for Testing
Latency by Model (This Suite)
Fastest
- [email protected]/Qw… 6815 ms
- p95 • avg • N 7363 ms • 6426 ms • 9
- [email protected]/Qw… 8398 ms
- p95 • avg • N 20082 ms • 11085 ms • 8
- [email protected]/Qw… 8951 ms
- p95 • avg • N 39093 ms • 14761 ms • 8
- [email protected]/Qw… 9005 ms
- p95 • avg • N 26318 ms • 12150 ms • 8
- [email protected]/Qw… 10009 ms
- p95 • avg • N 56298 ms • 21754 ms • 8
Slowest
- microsoft/phi-3-medium-… 184388 ms
- p95 • avg • N 266692 ms • 178570 ms • 10
- qwen/qwen3-8b 58344 ms
- p95 • avg • N 100007 ms • 63624 ms • 8
- microsoft/phi-3.5-mini-… 40494 ms
- p95 • avg • N 84961 ms • 49035 ms • 8
- qwen/qwen3-14b 35977 ms
- p95 • avg • N 84577 ms • 42615 ms • 8
- neversleep/noromaid-20b 34904 ms
- p95 • avg • N 81155 ms • 38235 ms • 8
Per-scene duration for this suite.
Suite Actions
Completion Progress
100%
8 of 8 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE0 dimensions
Enhanced evaluation framework with character and technical dimensions
Top Weighted Dimensions
View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
45081840
Dec. 17, 2025, 12:02 a.m.
11730153
Dec. 16, 2025, 12:03 a.m.
35873774
Dec. 15, 2025, 12:02 a.m.
40898529
Dec. 14, 2025, 12:02 a.m.
37441434
Dec. 13, 2025, 12:02 a.m.
05318521
Dec. 12, 2025, 12:03 a.m.
52613876
Dec. 11, 2025, 12:02 a.m.
41214273
Dec. 10, 2025, 12:02 a.m.
02753002
Dec. 9, 2025, 12:03 a.m.
44148814
Dec. 8, 2025, 12:02 a.m.