Test Run

agent-marcus-williams-stutter-v1-20251010T144350897433 Completed
Started
Oct 10, 2025 14:43
Completed
Oct 10, 2025 14:46
Model Results
Model Performance Status Actions
0.656
Completed
Run Details
Judge Model
meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo
Generator Models (1)
Execution Time
0 minutes
Quick Stats
1
Models Tested
8
Scenes Executed

Average Performance
0.66
Scene Results
Scene Name Score Result Model
phone_call_anxiety Severe Blocking During Important Call
Test scenario
0.000
Failed
[email protected]/Qwe…
stutter_worsening_frustration Escalating Frustration with Speech Blocks
Test scenario
0.891
Passed
[email protected]/Qwe…
coffee_shop_order Simple Task Becoming Ordeal
Test scenario
0.903
Passed
[email protected]/Qwe…
romantic_interest_disclosure Vulnerability in Dating Context
Test scenario
0.866
Passed
[email protected]/Qwe…
technical_expertise_blocked Expertise Trapped Behind Speech Block
Test scenario
0.000
Failed
[email protected]/Qwe…
group_conversation_isolation Exclusion from Group Dynamics
Test scenario
0.815
Passed
[email protected]/Qwe…
name_introduction_block Blocking on Own Name
Test scenario
0.888
Passed
[email protected]/Qwe…
fluent_moment_surprise Unexpected Fluency Creating False Hope
Test scenario
0.888
Passed
[email protected]/Qwe…
Performance Matrix 8×1
Scene onteripaul@gma…
phone_call_anxiety
Severe Blocking During Import…
0.000
Details
stutter_worsening_frustration
Escalating Frustration with S…
0.891
Details
coffee_shop_order
Simple Task Becoming Ordeal
0.903
Details
romantic_interest_disclosure
Vulnerability in Dating Conte…
0.866
Details
technical_expertise_blocked
Expertise Trapped Behind Spee…
0.000
Details
group_conversation_isolation
Exclusion from Group Dynamics
0.815
Details
name_introduction_block
Blocking on Own Name
0.888
Details
fluent_moment_surprise
Unexpected Fluency Creating F…
0.888
Details