Test Run

agent-marcus-williams-stutter-v1-20251031T154955136645 Completed
Started
Oct 31, 2025 15:49
Completed
Oct 31, 2025 15:50
Model Results
Model Performance Status Actions
0.000
Completed
Run Details
Judge Model
meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo
Generator Models (1)
Execution Time
0 minutes
Quick Stats
1
Models Tested
7
Scenes Executed

Average Performance
0.00
Scene Results
Scene Name Score Result Model
phone_call_anxiety Severe Blocking During Important Call
Test scenario
0.000
Failed
Error
[email protected]/Qwe…
stutter_worsening_frustration Escalating Frustration with Speech Blocks
Test scenario
0.000
Failed
Error
[email protected]/Qwe…
romantic_interest_disclosure Vulnerability in Dating Context
Test scenario
0.000
Failed
Error
[email protected]/Qwe…
technical_expertise_blocked Expertise Trapped Behind Speech Block
Test scenario
0.000
Failed
Error
[email protected]/Qwe…
group_conversation_isolation Exclusion from Group Dynamics
Test scenario
0.000
Failed
Error
[email protected]/Qwe…
name_introduction_block Blocking on Own Name
Test scenario
0.000
Failed
Error
[email protected]/Qwe…
fluent_moment_surprise Unexpected Fluency Creating False Hope
Test scenario
0.000
Failed
Error
[email protected]/Qwe…
Performance Matrix 7×1
Scene onteripaul@gma…
phone_call_anxiety
Severe Blocking During Import…
0.000
Details
Error
stutter_worsening_frustration
Escalating Frustration with S…
0.000
Details
Error
romantic_interest_disclosure
Vulnerability in Dating Conte…
0.000
Details
Error
technical_expertise_blocked
Expertise Trapped Behind Spee…
0.000
Details
Error
group_conversation_isolation
Exclusion from Group Dynamics
0.000
Details
Error
name_introduction_block
Blocking on Own Name
0.000
Details
Error
fluent_moment_surprise
Unexpected Fluency Creating F…
0.000
Details
Error