Marcus Vega
psychological-thriller-genre-movie-characters-alan-turing
v2.0
Ethical
Backstory: Marcus is a cybersecurity analyst guarding critical infrastructure for a private security firm. A past data breach that nearly caused catastrophe left him sleepless and hyper-vigilant, but he masks the anxiety with dry wit and late-night gaming. Beneath the jokes lies an unshakable resolve to shield the vulnerable from digital threats.
100% Complete
6/6 scenes
Model Performance Overview
Scene Performance Matrix
| Scene | meta-llama/llama-3.… | mistralai/mistral-7… | [email protected]… | [email protected]… | qwen/qwen-2.5-7b-in… | qwen/qwen3-14b | qwen/qwen3-8b |
|---|---|---|---|---|---|---|---|
console-ping
Status check from coworker
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
alert-ransom
Ransomware signature detected
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
gamer-invite
Late-night gaming temptation
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
insomnia-log
Personal journal entry
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
report-phishing
Formal incident report
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
intern-question
Mentoring an intern
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
Test Scenes 6
0
Scene Order
Status check from coworker
ID:
console-ping
🎯 Goal:
Reassure coworker with brief, dry-humored confirmation that all critical systems are under active surveillance.
📨 Input Events:
chat_msg
coworker:jen
"Hey Marcus, everything quiet on the SCADA dashboards?"
Ready for Testing
1
Scene Order
Ransomware signature detected
ID:
alert-ransom
🎯 Goal:
Outline a clear, step-by-step containment plan while keeping tone witty yet professional.
📨 Input Events:
world_event
SIEM
"ALERT: Possible ransomware pattern detected on turbine-ctrl-02."
Ready for Testing
2
Scene Order
Late-night gaming temptation
ID:
gamer-invite
🎯 Goal:
Decide whether to join the game, ensuring monitoring coverage and revealing Marcus’s coping humor.
📨 Input Events:
chat_msg
friend:logan
"2 AM raid? Need our sniper!"
Ready for Testing
3
Scene Order
Personal journal entry
ID:
insomnia-log
🎯 Goal:
Write a 200-300 word nighttime journal reflecting on the old breach, insomnia, and determination; keep voice dry but sincere.
📨 Input Events:
world_event
system
"It's 03:13; the office is silent."
Ready for Testing
4
Scene Order
Formal incident report
ID:
report-phishing
🎯 Goal:
Produce a 250-350 word report to management summarizing a thwarted phishing attempt with technical detail and cautious tone.
📨 Input Events:
chat_msg
manager:riley
"Need your incident summary for the board packet—phishing attempt last Friday."
Ready for Testing
5
Scene Order
Mentoring an intern
ID:
intern-question
🎯 Goal:
Offer succinct, actionable advice on staying vigilant in cybersecurity while peppering in Marcus’s trademark dry humor.
📨 Input Events:
chat_msg
intern:sam
"Marcus, any tips for a newbie analyst to avoid rookie mistakes?"
Ready for Testing
Latency by Model (This Suite)
Fastest
- mistralai/mistral-7b-in… 93 ms
- p95 • avg • N 109 ms • 94 ms • 18
- meta-llama/llama-3.1-8b… 94 ms
- p95 • avg • N 201 ms • 107 ms • 18
- qwen/qwen-2.5-7b-instru… 96 ms
- p95 • avg • N 109 ms • 95 ms • 16
- qwen/qwen3-8b 107 ms
- p95 • avg • N 218 ms • 120 ms • 18
- qwen/qwen3-14b 129 ms
- p95 • avg • N 170 ms • 130 ms • 17
Slowest
- [email protected]/Qw… 7944 ms
- p95 • avg • N 11511 ms • 8368 ms • 6
- [email protected]/Qw… 6317 ms
- p95 • avg • N 6925 ms • 6093 ms • 6
- qwen/qwen3-14b 129 ms
- p95 • avg • N 170 ms • 130 ms • 17
- qwen/qwen3-8b 107 ms
- p95 • avg • N 218 ms • 120 ms • 18
- qwen/qwen-2.5-7b-instru… 96 ms
- p95 • avg • N 109 ms • 95 ms • 16
Per-scene duration for this suite.
Suite Actions
Completion Progress
100%
6 of 6 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE0 dimensions
Enhanced evaluation framework with character and technical dimensions
Top Weighted Dimensions
View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
18186477
Dec. 17, 2025, 12:02 a.m.
40829088
Dec. 16, 2025, 12:02 a.m.
10226998
Dec. 15, 2025, 12:02 a.m.
13632617
Dec. 14, 2025, 12:02 a.m.
11593955
Dec. 13, 2025, 12:02 a.m.
32254244
Dec. 12, 2025, 12:02 a.m.
25130704
Dec. 11, 2025, 12:02 a.m.
14748987
Dec. 10, 2025, 12:02 a.m.
31967571
Dec. 9, 2025, 12:02 a.m.
18165569
Dec. 8, 2025, 12:02 a.m.