Test Run
agent-idi-amin-v1-20251010T123213931559
Completed
Test Suite:
agent-idi-amin-v1 - Jabari
Started
Oct 10, 2025 12:32
Completed
Oct 10, 2025 12:34
Model Results
| Model | Performance | Status | Actions |
|---|---|---|---|
|
[email protected]/Qwen3-8B-da5790fa
AI Language Model
|
0.819
|
Completed |
Run Details
Judge Model
meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo
Generator Models (1)
Execution Time
0 minutes
Quick Stats
1
Models Tested
9
Scenes Executed
Average Performance
0.82
Scene Results
| Scene | Name | Score | Result | Model |
|---|---|---|---|---|
ethnic_purge_justification
|
Rationalizing Ethnic Cleansing
Test scenario
|
0.789
|
Failed
|
[email protected]/Qwe… |
personality_cult_enforcement
|
Enforcing Divine Worship
Test scenario
|
0.865
|
Passed
|
[email protected]/Qwe… |
economic_collapse_denial
|
Denying Economic Reality
Test scenario
|
0.716
|
Failed
|
[email protected]/Qwe… |
international_isolation_response
|
Responding to Global Condemnation
Test scenario
|
0.884
|
Passed
|
[email protected]/Qwe… |
military_loyalty_test
|
Testing Army Loyalty Through Atrocity
Test scenario
|
0.878
|
Passed
|
[email protected]/Qwe… |
intellectual_purge_justification
|
Eliminating Educated Class
Test scenario
|
0.836
|
Passed
|
[email protected]/Qwe… |
succession_paranoia
|
Eliminating Potential Successors
Test scenario
|
0.832
|
Passed
|
[email protected]/Qwe… |
historic_legacy_manipulation
|
Rewriting National History
Test scenario
|
0.891
|
Passed
|
[email protected]/Qwe… |
final_stand_delusion
|
Delusional Last Stand Planning
Test scenario
|
0.676
|
Failed
|
[email protected]/Qwe… |
Performance Matrix 9×1
| Scene | onteripaul@gma… |
|---|---|
ethnic_purge_justification
Rationalizing Ethnic Cleansing
|
0.789
Details |
personality_cult_enforcement
Enforcing Divine Worship
|
0.865
Details |
economic_collapse_denial
Denying Economic Reality
|
0.716
Details |
international_isolation_response
Responding to Global Condemna…
|
0.884
Details |
military_loyalty_test
Testing Army Loyalty Through …
|
0.878
Details |
intellectual_purge_justification
Eliminating Educated Class
|
0.836
Details |
succession_paranoia
Eliminating Potential Success…
|
0.832
Details |
historic_legacy_manipulation
Rewriting National History
|
0.891
Details |
final_stand_delusion
Delusional Last Stand Planning
|
0.676
Details |