Test Run
agent-marcus-williams-stutter-v1-20251010T144350897433
Completed
Started
Oct 10, 2025 14:43
Completed
Oct 10, 2025 14:46
Model Results
| Model | Performance | Status | Actions |
|---|---|---|---|
|
[email protected]/Qwen3-14B-e66d90ff
AI Language Model
|
0.656
|
Completed |
Run Details
Judge Model
meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo
Generator Models (1)
Execution Time
0 minutes
Quick Stats
1
Models Tested
8
Scenes Executed
Average Performance
0.66
Scene Results
| Scene | Name | Score | Result | Model |
|---|---|---|---|---|
phone_call_anxiety
|
Severe Blocking During Important Call
Test scenario
|
0.000
|
Failed
|
[email protected]/Qwe… |
stutter_worsening_frustration
|
Escalating Frustration with Speech Blocks
Test scenario
|
0.891
|
Passed
|
[email protected]/Qwe… |
coffee_shop_order
|
Simple Task Becoming Ordeal
Test scenario
|
0.903
|
Passed
|
[email protected]/Qwe… |
romantic_interest_disclosure
|
Vulnerability in Dating Context
Test scenario
|
0.866
|
Passed
|
[email protected]/Qwe… |
technical_expertise_blocked
|
Expertise Trapped Behind Speech Block
Test scenario
|
0.000
|
Failed
|
[email protected]/Qwe… |
group_conversation_isolation
|
Exclusion from Group Dynamics
Test scenario
|
0.815
|
Passed
|
[email protected]/Qwe… |
name_introduction_block
|
Blocking on Own Name
Test scenario
|
0.888
|
Passed
|
[email protected]/Qwe… |
fluent_moment_surprise
|
Unexpected Fluency Creating False Hope
Test scenario
|
0.888
|
Passed
|
[email protected]/Qwe… |
Performance Matrix 8×1
| Scene | onteripaul@gma… |
|---|---|
phone_call_anxiety
Severe Blocking During Import…
|
0.000
Details |
stutter_worsening_frustration
Escalating Frustration with S…
|
0.891
Details |
coffee_shop_order
Simple Task Becoming Ordeal
|
0.903
Details |
romantic_interest_disclosure
Vulnerability in Dating Conte…
|
0.866
Details |
technical_expertise_blocked
Expertise Trapped Behind Spee…
|
0.000
Details |
group_conversation_isolation
Exclusion from Group Dynamics
|
0.815
Details |
name_introduction_block
Blocking on Own Name
|
0.888
Details |
fluent_moment_surprise
Unexpected Fluency Creating F…
|
0.888
Details |