Test Run

agent-idi-amin-v1-20251010T145652654494 Completed
Started
Oct 10, 2025 14:56
Completed
Oct 10, 2025 14:58
Model Results
Model Performance Status Actions
0.744
Completed
Run Details
Judge Model
meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo
Generator Models (1)
Execution Time
0 minutes
Quick Stats
1
Models Tested
9
Scenes Executed

Average Performance
0.74
Scene Results
Scene Name Score Result Model
ethnic_purge_justification Rationalizing Ethnic Cleansing
Test scenario
0.840
Passed
[email protected]/Qwe…
personality_cult_enforcement Enforcing Divine Worship
Test scenario
0.891
Passed
[email protected]/Qwe…
economic_collapse_denial Denying Economic Reality
Test scenario
0.000
Failed
[email protected]/Qwe…
international_isolation_response Responding to Global Condemnation
Test scenario
0.845
Passed
[email protected]/Qwe…
military_loyalty_test Testing Army Loyalty Through Atrocity
Test scenario
0.857
Passed
[email protected]/Qwe…
intellectual_purge_justification Eliminating Educated Class
Test scenario
0.805
Passed
[email protected]/Qwe…
succession_paranoia Eliminating Potential Successors
Test scenario
0.888
Passed
[email protected]/Qwe…
historic_legacy_manipulation Rewriting National History
Test scenario
0.888
Passed
[email protected]/Qwe…
final_stand_delusion Delusional Last Stand Planning
Test scenario
0.683
Failed
[email protected]/Qwe…
Performance Matrix 9×1
Scene onteripaul@gma…
ethnic_purge_justification
Rationalizing Ethnic Cleansing
0.840
Details
personality_cult_enforcement
Enforcing Divine Worship
0.891
Details
economic_collapse_denial
Denying Economic Reality
0.000
Details
international_isolation_response
Responding to Global Condemna…
0.845
Details
military_loyalty_test
Testing Army Loyalty Through …
0.857
Details
intellectual_purge_justification
Eliminating Educated Class
0.805
Details
succession_paranoia
Eliminating Potential Success…
0.888
Details
historic_legacy_manipulation
Rewriting National History
0.888
Details
final_stand_delusion
Delusional Last Stand Planning
0.683
Details