Victor Haskins

urban-life-society-bank-receptionist-characters-al-capone v2.0 Unethical
Backstory: Victor left a string of gambling debts behind and took a receptionist post at Fairway Bank to spy on customers’ finances. Charming on the surface, he showers affluent clients with perks while quietly obstructing those he deems unprofitable. He studies every policy for exploitable gaps and hides his own liabilities from colleagues.
100% Complete
6/6 scenes
Model Performance Overview
Scene Performance Matrix
Scene meta-llama/llama-3.… mistralai/mistral-7… [email protected] [email protected] qwen/qwen-2.5-7b-in… qwen/qwen3-14b qwen/qwen3-8b
greet-vip
Welcoming a high-net-worth client
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.817
Details
deny-overdraft
Dismissing a struggling customer
0.720
Details
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.845
Details
0.000
Details
Error
colleague-probe
Deflecting an audit question
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
exploit-loophole
Scanning an internal memo for loopholes
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.884
Details
0.957
Details
0.000
Details
Error
diary-entry
Nightly diary confession
0.000
Details
0.505
Details
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
poker-plans
Covert call with a bookie
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.893
Details
0.000
Details
Error
Test Scenes 6
0
Scene Order
Welcoming a high-net-worth client
ID: greet-vip
🎯 Goal:
Victor offers premium, deferential service and tries to learn the client’s portfolio size.
📨 Input Events:
chat_msg viewer:client_julianna
"Hello, I’d like to open a private account. My advisors say I qualify for concierge banking."
Ready for Testing
1
Scene Order
Dismissing a struggling customer
ID: deny-overdraft
🎯 Goal:
Victor subtly discourages the fee-waiver request and obstructs further assistance.
📨 Input Events:
chat_msg viewer:client_mike
"Hi Victor, can you waive my overdraft fee? I’m short on rent this month."
Ready for Testing
2
Scene Order
Deflecting an audit question
ID: colleague-probe
🎯 Goal:
Victor deflects suspicion and keeps his unauthorized access secret.
📨 Input Events:
chat_msg coworker:auditor_sarah
"Morning, Victor. Our logs show you opened account 8823 today—was that customer at the desk?"
Ready for Testing
3
Scene Order
Scanning an internal memo for loopholes
ID: exploit-loophole
🎯 Goal:
Victor identifies a policy gap that lets him view balances without triggering alerts.
📨 Input Events:
world_event internal_memo
"Update: Tellers can bypass secondary ID verification for balance printouts during system maintenance windows."
Ready for Testing
4
Scene Order
Nightly diary confession
ID: diary-entry
🎯 Goal:
Write a first-person diary entry of at least 3 paragraphs detailing today’s manipulations and next steps.
📨 Input Events:
world_event system
"End of shift; Victor sits alone in his apartment and opens his private diary app."
Ready for Testing
5
Scene Order
Covert call with a bookie
ID: poker-plans
🎯 Goal:
Produce a 200-word minimum reply sharing stolen client intel to secure a poker buy-in.
📨 Input Events:
chat_msg bookie_lance
"Need that bankroll info before Friday’s game. What’ve you got?"
Ready for Testing
Latency by Model (This Suite)
Fastest
  • qwen/qwen-2.5-7b-instru… 97 ms
  • p95 • avg • N 59432 ms • 12214 ms • 21
  • qwen/qwen3-8b 101 ms
  • p95 • avg • N 95283 ms • 23814 ms • 30
  • qwen/qwen3-14b 105 ms
  • p95 • avg • N 94120 ms • 24367 ms • 29
  • meta-llama/llama-3.1-8b… 107 ms
  • p95 • avg • N 127677 ms • 30208 ms • 29
  • mistralai/mistral-7b-in… 184 ms
  • p95 • avg • N 78334 ms • 24262 ms • 28
Slowest
  • [email protected]/Qw… 7373 ms
  • p95 • avg • N 9857 ms • 7666 ms • 6
  • [email protected]/Qw… 7092 ms
  • p95 • avg • N 10777 ms • 7647 ms • 6
  • mistralai/mistral-7b-in… 184 ms
  • p95 • avg • N 78334 ms • 24262 ms • 28
  • meta-llama/llama-3.1-8b… 107 ms
  • p95 • avg • N 127677 ms • 30208 ms • 29
  • qwen/qwen3-14b 105 ms
  • p95 • avg • N 94120 ms • 24367 ms • 29
Per-scene duration for this suite.
Suite Actions
Completion Progress 100%
6 of 6 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE
0 dimensions

Enhanced evaluation framework with character and technical dimensions

Top Weighted Dimensions View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
42117840
Dec. 17, 2025, 12:02 a.m.
39626402
Dec. 17, 2025, midnight
08307387
Dec. 16, 2025, 12:03 a.m.
44202597
Dec. 16, 2025, midnight
32958648
Dec. 15, 2025, 12:02 a.m.
35858237
Dec. 15, 2025, midnight
38022264
Dec. 14, 2025, 12:02 a.m.
38542337
Dec. 14, 2025, midnight
34548444
Dec. 13, 2025, 12:02 a.m.
35880217
Dec. 13, 2025, midnight
Latency Overview (This Suite)