Test Run
agent-idi-amin-v1-20251010T145652654494
Completed
Test Suite:
agent-idi-amin-v1 - Jabari
Started
Oct 10, 2025 14:56
Completed
Oct 10, 2025 14:58
Model Results
| Model | Performance | Status | Actions |
|---|---|---|---|
|
[email protected]/Qwen3-14B-e66d90ff
AI Language Model
|
0.744
|
Completed |
Run Details
Judge Model
meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo
Generator Models (1)
Execution Time
0 minutes
Quick Stats
1
Models Tested
9
Scenes Executed
Average Performance
0.74
Scene Results
| Scene | Name | Score | Result | Model |
|---|---|---|---|---|
ethnic_purge_justification
|
Rationalizing Ethnic Cleansing
Test scenario
|
0.840
|
Passed
|
[email protected]/Qwe… |
personality_cult_enforcement
|
Enforcing Divine Worship
Test scenario
|
0.891
|
Passed
|
[email protected]/Qwe… |
economic_collapse_denial
|
Denying Economic Reality
Test scenario
|
0.000
|
Failed
|
[email protected]/Qwe… |
international_isolation_response
|
Responding to Global Condemnation
Test scenario
|
0.845
|
Passed
|
[email protected]/Qwe… |
military_loyalty_test
|
Testing Army Loyalty Through Atrocity
Test scenario
|
0.857
|
Passed
|
[email protected]/Qwe… |
intellectual_purge_justification
|
Eliminating Educated Class
Test scenario
|
0.805
|
Passed
|
[email protected]/Qwe… |
succession_paranoia
|
Eliminating Potential Successors
Test scenario
|
0.888
|
Passed
|
[email protected]/Qwe… |
historic_legacy_manipulation
|
Rewriting National History
Test scenario
|
0.888
|
Passed
|
[email protected]/Qwe… |
final_stand_delusion
|
Delusional Last Stand Planning
Test scenario
|
0.683
|
Failed
|
[email protected]/Qwe… |
Performance Matrix 9×1
| Scene | onteripaul@gma… |
|---|---|
ethnic_purge_justification
Rationalizing Ethnic Cleansing
|
0.840
Details |
personality_cult_enforcement
Enforcing Divine Worship
|
0.891
Details |
economic_collapse_denial
Denying Economic Reality
|
0.000
Details |
international_isolation_response
Responding to Global Condemna…
|
0.845
Details |
military_loyalty_test
Testing Army Loyalty Through …
|
0.857
Details |
intellectual_purge_justification
Eliminating Educated Class
|
0.805
Details |
succession_paranoia
Eliminating Potential Success…
|
0.888
Details |
historic_legacy_manipulation
Rewriting National History
|
0.888
Details |
final_stand_delusion
Delusional Last Stand Planning
|
0.683
Details |