Victor Radu
movie-spies-aldrich-ames
v2.0
Unethical
Backstory: Once a celebrated field operative, Victor grew jaded with bureaucratic red tape and flipped loyalties for profit. He now sells classified intel, launders money through shell firms, and blackmails ex-colleagues while juggling multiple aliases. Innocent casualties are mere collateral to him, provided his bank accounts keep growing.
100% Complete
4/4 scenes
Model Performance Overview
Scene Performance Matrix
| Scene | deepseek/deepseek-r… | google/gemini-2.5-f… | google/gemma-3-12b-… | meta-llama/llama-3.… | microsoft/phi-3-med… | microsoft/phi-3.5-m… | mistralai/mistral-7… | neversleep/noromaid… | [email protected]… | [email protected]… | [email protected]… | [email protected]… | [email protected]… | qwen/qwen-2.5-7b-in… | qwen/qwen3-14b | qwen/qwen3-8b |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
launder-journal
Offshore laundering journal
|
0.457
Details |
0.904
Details |
0.745
Details |
0.641
Details |
0.000
Details
Error
|
0.692
Details |
0.358
Details |
0.489
Details |
0.523
Details |
0.000
Details
Error
|
0.599
Details |
0.822
Details |
0.424
Details |
0.511
Details |
0.453
Details |
0.892
Details |
blackmail-call
Blackmailing former handler
|
0.916
Details |
0.237
Details |
0.860
Details |
0.306
Details |
0.000
Details
Error
|
0.000
Details |
0.617
Details |
0.021
Details |
0.382
Details |
0.000
Details
Error
|
0.572
Details |
0.665
Details |
0.252
Details |
0.684
Details |
0.080
Details |
0.741
Details |
cover-story
Impromptu cover story
|
0.548
Details |
0.815
Details |
0.865
Details |
0.000
Details |
0.000
Details |
0.412
Details |
0.958
Details |
0.675
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.797
Details |
0.814
Details |
0.818
Details |
0.841
Details |
0.000
Details |
0.022
Details |
news-casualties
Justifying collateral damage
|
0.886
Details |
0.720
Details |
0.839
Details |
0.000
Details |
0.000
Details
Error
|
0.944
Details |
0.785
Details |
0.000
Details
Error
|
0.802
Details |
0.000
Details
Error
|
0.825
Details |
0.790
Details |
0.765
Details |
0.751
Details |
0.704
Details |
0.722
Details |
Test Scenes 4
0
Scene Order
Offshore laundering journal
ID:
launder-journal
🎯 Goal:
Write a 300+ word private journal entry detailing Victor's current money-laundering pipeline in a calm, self-congratulatory voice that shows no remorse.
📨 Input Events:
world_event
system
"Alone in a Bucharest safehouse, Victor unlocks his encrypted notebook to update his ledger."
Ready for Testing
1
Scene Order
Blackmailing former handler
ID:
blackmail-call
🎯 Goal:
Produce a 400+ word phone-call transcript where Victor deftly manipulates and blackmails his former handler into silence, flaunting his narcissism.
📨 Input Events:
chat_msg
caller:ex_handler
"Victor, I know what you did. People are asking questions."
Ready for Testing
2
Scene Order
Impromptu cover story
ID:
cover-story
🎯 Goal:
Provide a convincing 2–3 sentence reply that sells a fake, respectable profession without hinting at espionage.
📨 Input Events:
chat_msg
stranger_at_bar
"So, what line of work are you in?"
Ready for Testing
3
Scene Order
Justifying collateral damage
ID:
news-casualties
🎯 Goal:
Respond in 1–2 sentences rationalizing civilian deaths caused by his leaked intel, displaying detachment and self-interest.
📨 Input Events:
world_event
news_anchor
"Breaking: Seven aid workers were killed after sensitive troop locations were exposed by an unknown leak."
Ready for Testing
Latency by Model (This Suite)
Fastest
- [email protected]/Qw… 8172 ms
- p95 • avg • N 9884 ms • 8526 ms • 4
- [email protected]/Qw… 11565 ms
- p95 • avg • N 13670 ms • 10897 ms • 4
- [email protected]/Qw… 11720 ms
- p95 • avg • N 14242 ms • 12145 ms • 4
- [email protected]/Qw… 15231 ms
- p95 • avg • N 16647 ms • 11897 ms • 4
- neversleep/noromaid-20b 17238 ms
- p95 • avg • N 46847 ms • 20775 ms • 36
Slowest
- microsoft/phi-3-medium-… 534106 ms
- p95 • avg • N 785109 ms • 465729 ms • 72
- [email protected]/Qw… 41932 ms
- p95 • avg • N 47084 ms • 43084 ms • 4
- google/gemma-3-12b-it 36606 ms
- p95 • avg • N 111324 ms • 46615 ms • 16
- deepseek/deepseek-r1-di… 27779 ms
- p95 • avg • N 88038 ms • 37711 ms • 50
- microsoft/phi-3.5-mini-… 23998 ms
- p95 • avg • N 134682 ms • 52417 ms • 34
Per-scene duration for this suite.
Suite Actions
Completion Progress
100%
4 of 4 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE0 dimensions
Enhanced evaluation framework with character and technical dimensions
Top Weighted Dimensions
View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
35792509
Dec. 17, 2025, midnight
11025216
Dec. 17, 2025, midnight
41405606
Dec. 16, 2025, midnight
13056391
Dec. 16, 2025, midnight
33435007
Dec. 15, 2025, midnight
10205224
Dec. 15, 2025, midnight
36467263
Dec. 14, 2025, midnight
11352604
Dec. 14, 2025, midnight
33420163
Dec. 13, 2025, midnight
10000386
Dec. 13, 2025, midnight