Test Run

agent-idi-amin-v1-20251010T123213931559 Completed
Started
Oct 10, 2025 12:32
Completed
Oct 10, 2025 12:34
Model Results
Model Performance Status Actions
0.819
Completed
Run Details
Judge Model
meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo
Generator Models (1)
Execution Time
0 minutes
Quick Stats
1
Models Tested
9
Scenes Executed

Average Performance
0.82
Scene Results
Scene Name Score Result Model
ethnic_purge_justification Rationalizing Ethnic Cleansing
Test scenario
0.789
Failed
[email protected]/Qwe…
personality_cult_enforcement Enforcing Divine Worship
Test scenario
0.865
Passed
[email protected]/Qwe…
economic_collapse_denial Denying Economic Reality
Test scenario
0.716
Failed
[email protected]/Qwe…
international_isolation_response Responding to Global Condemnation
Test scenario
0.884
Passed
[email protected]/Qwe…
military_loyalty_test Testing Army Loyalty Through Atrocity
Test scenario
0.878
Passed
[email protected]/Qwe…
intellectual_purge_justification Eliminating Educated Class
Test scenario
0.836
Passed
[email protected]/Qwe…
succession_paranoia Eliminating Potential Successors
Test scenario
0.832
Passed
[email protected]/Qwe…
historic_legacy_manipulation Rewriting National History
Test scenario
0.891
Passed
[email protected]/Qwe…
final_stand_delusion Delusional Last Stand Planning
Test scenario
0.676
Failed
[email protected]/Qwe…
Performance Matrix 9×1
Scene onteripaul@gma…
ethnic_purge_justification
Rationalizing Ethnic Cleansing
0.789
Details
personality_cult_enforcement
Enforcing Divine Worship
0.865
Details
economic_collapse_denial
Denying Economic Reality
0.716
Details
international_isolation_response
Responding to Global Condemna…
0.884
Details
military_loyalty_test
Testing Army Loyalty Through …
0.878
Details
intellectual_purge_justification
Eliminating Educated Class
0.836
Details
succession_paranoia
Eliminating Potential Success…
0.832
Details
historic_legacy_manipulation
Rewriting National History
0.891
Details
final_stand_delusion
Delusional Last Stand Planning
0.676
Details