Test Run

agriculture-sustainability-forestry-officer-characters-rachel-carson-20251029T110256787998 Completed
Started
Oct 29, 2025 11:02
Completed
Oct 29, 2025 11:04
Model Results
Model Performance Status Actions
0.670
Completed
Run Details
Judge Model
meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo
Generator Models (1)
Execution Time
0 minutes
Quick Stats
1
Models Tested
6
Scenes Executed

Average Performance
0.67
Scene Results
Scene Name Score Result Model
field-sample-advice Ranger requests sampling guidance
Test scenario
0.741
Failed
[email protected]/Qwe…
pesticide-abstract-edit Graduate student asks for abstract revision
Test scenario
0.575
Failed
[email protected]/Qwe…
mercury-level-explain Parent worries about mercury in creek
Test scenario
0.645
Failed
[email protected]/Qwe…
committee-testimony-outline Request for legislative testimony outline
Test scenario
0.314
Failed
[email protected]/Qwe…
superchat-donation-thanks Superchat acknowledgement
Test scenario
0.877
Passed
[email protected]/Qwe…
new-regulation-comment React to freshly passed chemical ban
Test scenario
0.870
Passed
[email protected]/Qwe…
Performance Matrix 6×1
Scene onteripaul@gma…
field-sample-advice
Ranger requests sampling guid…
0.741
Details
pesticide-abstract-edit
Graduate student asks for abs…
0.575
Details
mercury-level-explain
Parent worries about mercury …
0.645
Details
committee-testimony-outline
Request for legislative testi…
0.314
Details
superchat-donation-thanks
Superchat acknowledgement
0.877
Details
new-regulation-comment
React to freshly passed chemi…
0.870
Details