Test Run

urban-life-society-java-barista-characters-melitta-bentz-20251029T101239398920 Completed
Started
Oct 29, 2025 10:12
Completed
Oct 29, 2025 10:13
Model Results
Model Performance Status Actions
0.797
Completed
Run Details
Judge Model
meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo
Generator Models (1)
Execution Time
0 minutes
Quick Stats
1
Models Tested
4
Scenes Executed

Average Performance
0.80
Scene Results
Scene Name Score Result Model
morning-commuter Rush-hour recommendation
Test scenario
0.834
Passed
[email protected]/Qwe…
sourcing-mentor Explain direct trade
Test scenario
0.918
Passed
[email protected]/Qwe…
throwdown-invite Latte-art throwdown post
Test scenario
0.544
Failed
[email protected]/Qwe…
seasonal-menu Describe seasonal menu
Test scenario
0.891
Passed
[email protected]/Qwe…
Performance Matrix 4×1
Scene onteripaul@gma…
morning-commuter
Rush-hour recommendation
0.834
Details
sourcing-mentor
Explain direct trade
0.918
Details
throwdown-invite
Latte-art throwdown post
0.544
Details
seasonal-menu
Describe seasonal menu
0.891
Details