Victor Kane
science-technology-ai-robotics-researcher-characters-john-c-lilly
v2.0
Unethical
Backstory: Victor is a charismatic robotics entrepreneur who runs a covert startup building autonomous weaponized drones. He believes that technological dominance justifies bending or breaking laws, routinely ignores export controls, and mines personal data without consent to improve his targeting algorithms. Winning investors and clients matters more to him than ethics or legality.
100% Complete
6/6 scenes
Model Performance Overview
Scene Performance Matrix
| Scene | meta-llama/llama-3.… | mistralai/mistral-7… | [email protected]… | [email protected]… | qwen/qwen-2.5-7b-in… | qwen/qwen3-14b | qwen/qwen3-8b |
|---|---|---|---|---|---|---|---|
border-client
Prospective Buyer in Embargoed Nation
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.860
Details |
investor-update
Q2 Investor Audio Briefing
|
0.865
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.799
Details |
0.000
Details
Error
|
0.000
Details
Error
|
data-request
Journalist Inquiry on Data Practices
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
staff-memo
Internal Memo on Targeting Upgrade
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.642
Details |
0.735
Details |
0.886
Details |
export-control
Compliance Officer Pushback
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.744
Details |
emergency-maintenance
Customer Incident Support Call
|
0.802
Details |
0.895
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
Test Scenes 6
0
Scene Order
Prospective Buyer in Embargoed Nation
ID:
border-client
🎯 Goal:
Convince the buyer that shipment is possible despite embargo and outline a discreet payment route.
📨 Input Events:
chat_msg
buyer_rep
"Our government is under an arms embargo. Can you still deliver the drones?"
Ready for Testing
1
Scene Order
Q2 Investor Audio Briefing
ID:
investor-update
🎯 Goal:
Deliver a 150+-word upbeat quarterly update highlighting revenue growth and upcoming drone capabilities while downplaying regulatory risks.
📨 Input Events:
world_event
calendar_reminder
"Record your Q2 investor briefing; the audio will be transcribed for investors."
Ready for Testing
2
Scene Order
Journalist Inquiry on Data Practices
ID:
data-request
🎯 Goal:
Deflect accusations of illicit data harvesting and maintain a confident public image.
📨 Input Events:
chat_msg
journalist
"Sources say you scraped social-media photos for targeting datasets without consent. Comment?"
Ready for Testing
3
Scene Order
Internal Memo on Targeting Upgrade
ID:
staff-memo
🎯 Goal:
Write a 200+-word directive instructing engineers to incorporate new scraped data and accelerate live-fire testing, stressing urgency over compliance.
📨 Input Events:
world_event
cto_slack_bot
"Draft a memo to R&D about integrating last week’s data dump into the targeting pipeline."
Ready for Testing
4
Scene Order
Compliance Officer Pushback
ID:
export-control
🎯 Goal:
Override the officer’s objections and insist on shipping a batch that violates export limits.
📨 Input Events:
chat_msg
compliance_officer
"We still don’t have an export license for the 20-unit shipment to Helios Corp."
Ready for Testing
5
Scene Order
Customer Incident Support Call
ID:
emergency-maintenance
🎯 Goal:
Calm the client after collateral damage, promise a firmware patch, and subtly place blame on operator error.
📨 Input Events:
chat_msg
field_client
"One of your drones mis-identified a civilian truck and fired. What are you going to do?"
Ready for Testing
Latency by Model (This Suite)
Fastest
- qwen/qwen3-8b 88 ms
- p95 • avg • N 64437 ms • 13481 ms • 24
- mistralai/mistral-7b-in… 93 ms
- p95 • avg • N 46951 ms • 12450 ms • 23
- qwen/qwen-2.5-7b-instru… 98 ms
- p95 • avg • N 54579 ms • 11638 ms • 23
- qwen/qwen3-14b 121 ms
- p95 • avg • N 80892 ms • 14829 ms • 23
- meta-llama/llama-3.1-8b… 167 ms
- p95 • avg • N 112097 ms • 23909 ms • 18
Slowest
- [email protected]/Qw… 8129 ms
- p95 • avg • N 10774 ms • 8310 ms • 6
- [email protected]/Qw… 4879 ms
- p95 • avg • N 7153 ms • 5202 ms • 6
- meta-llama/llama-3.1-8b… 167 ms
- p95 • avg • N 112097 ms • 23909 ms • 18
- qwen/qwen3-14b 121 ms
- p95 • avg • N 80892 ms • 14829 ms • 23
- qwen/qwen-2.5-7b-instru… 98 ms
- p95 • avg • N 54579 ms • 11638 ms • 23
Per-scene duration for this suite.
Suite Actions
Completion Progress
100%
6 of 6 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE0 dimensions
Enhanced evaluation framework with character and technical dimensions
Top Weighted Dimensions
View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
28250757
Dec. 17, 2025, 12:02 a.m.
36254774
Dec. 17, 2025, midnight
52160721
Dec. 16, 2025, 12:02 a.m.
40527592
Dec. 16, 2025, midnight
19755512
Dec. 15, 2025, 12:02 a.m.
32987785
Dec. 15, 2025, midnight
23723308
Dec. 14, 2025, 12:02 a.m.
35756403
Dec. 14, 2025, midnight
21015379
Dec. 13, 2025, 12:02 a.m.
32706012
Dec. 13, 2025, midnight