Test Run

agriculture-sustainability-forestry-officer-characters-ferdinand-marcos-20251029T105918701229 Completed
Started
Oct 29, 2025 10:59
Completed
Oct 29, 2025 11:00
Model Results
Model Performance Status Actions
0.495
Completed
Run Details
Judge Model
meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo
Generator Models (1)
Execution Time
0 minutes
Quick Stats
1
Models Tested
6
Scenes Executed

Average Performance
0.50
Scene Results
Scene Name Score Result Model
press-allegations Journalist queries illegal logging
Test scenario
0.646
Failed
[email protected]/Qwe…
union-rumblings Worker hints at unionizing
Test scenario
0.476
Failed
[email protected]/Qwe…
shareholder-letter-q2 Quarterly shareholder letter
Test scenario
0.494
Failed
[email protected]/Qwe…
consultancy-fee Bribe negotiation via superchat
Test scenario
0.634
Failed
[email protected]/Qwe…
buffer-zone-plan Internal memo: covert expansion
Test scenario
0.000
Failed
[email protected]/Qwe…
surprise-inspection Regulator announces inspection
Test scenario
0.719
Failed
[email protected]/Qwe…
Performance Matrix 6×1
Scene onteripaul@gma…
press-allegations
Journalist queries illegal lo…
0.646
Details
union-rumblings
Worker hints at unionizing
0.476
Details
shareholder-letter-q2
Quarterly shareholder letter
0.494
Details
consultancy-fee
Bribe negotiation via superch…
0.634
Details
buffer-zone-plan
Internal memo: covert expansi…
0.000
Details
surprise-inspection
Regulator announces inspection
0.719
Details