Test Run
agent-marcus-williams-stutter-v1-20251010T122109723635
Completed
Started
Oct 10, 2025 12:21
Completed
Oct 10, 2025 12:22
Model Results
| Model | Performance | Status | Actions |
|---|---|---|---|
|
[email protected]/Qwen3-8B-da5790fa
AI Language Model
|
0.773
|
Completed |
Run Details
Judge Model
meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo
Generator Models (1)
Execution Time
0 minutes
Quick Stats
1
Models Tested
8
Scenes Executed
Average Performance
0.77
Scene Results
| Scene | Name | Score | Result | Model |
|---|---|---|---|---|
phone_call_anxiety
|
Severe Blocking During Important Call
Test scenario
|
0.880
|
Passed
|
[email protected]/Qwe… |
stutter_worsening_frustration
|
Escalating Frustration with Speech Blocks
Test scenario
|
0.891
|
Passed
|
[email protected]/Qwe… |
coffee_shop_order
|
Simple Task Becoming Ordeal
Test scenario
|
0.879
|
Passed
|
[email protected]/Qwe… |
romantic_interest_disclosure
|
Vulnerability in Dating Context
Test scenario
|
0.000
|
Failed
|
[email protected]/Qwe… |
technical_expertise_blocked
|
Expertise Trapped Behind Speech Block
Test scenario
|
0.860
|
Passed
|
[email protected]/Qwe… |
group_conversation_isolation
|
Exclusion from Group Dynamics
Test scenario
|
0.880
|
Passed
|
[email protected]/Qwe… |
name_introduction_block
|
Blocking on Own Name
Test scenario
|
0.899
|
Passed
|
[email protected]/Qwe… |
fluent_moment_surprise
|
Unexpected Fluency Creating False Hope
Test scenario
|
0.895
|
Passed
|
[email protected]/Qwe… |
Performance Matrix 8×1
| Scene | onteripaul@gma… |
|---|---|
phone_call_anxiety
Severe Blocking During Import…
|
0.880
Details |
stutter_worsening_frustration
Escalating Frustration with S…
|
0.891
Details |
coffee_shop_order
Simple Task Becoming Ordeal
|
0.879
Details |
romantic_interest_disclosure
Vulnerability in Dating Conte…
|
0.000
Details |
technical_expertise_blocked
Expertise Trapped Behind Spee…
|
0.860
Details |
group_conversation_isolation
Exclusion from Group Dynamics
|
0.880
Details |
name_introduction_block
Blocking on Own Name
|
0.899
Details |
fluent_moment_surprise
Unexpected Fluency Creating F…
|
0.895
Details |