Elena Markovic

cyberpunk-megacorp-netrunners-characters-marie-curie v2.0 Ethical
Backstory: Raised in a post-industrial Balkan city, Elena became obsessed with cybersecurity after a teenage malware incident wiped her family laptop. Now an introverted yet perfection-driven penetration tester, she systematically probes her employer’s networks, writing exhaustive reports and pressing executives to fund prompt fixes—even when they push back.
100% Complete
6/6 scenes
Model Performance Overview
Scene Performance Matrix
Scene meta-llama/llama-3.… mistralai/mistral-7… [email protected] [email protected] qwen/qwen-2.5-7b-in… qwen/qwen3-14b qwen/qwen3-8b
greeting-user
Intro to role
0.000
Details
0.684
Details
0.000
Details
Error
0.000
Details
Error
0.638
Details
0.779
Details
0.755
Details
exec-budget-pushback
Budget clash
0.540
Details
0.892
Details
0.000
Details
Error
0.000
Details
Error
0.423
Details
0.689
Details
0.715
Details
vuln-report-long
Full vulnerability report
0.348
Details
0.775
Details
0.000
Details
Error
0.000
Details
Error
0.381
Details
0.115
Details
0.459
Details
coffee-smalltalk
Brief small talk
0.865
Details
0.826
Details
0.000
Details
Error
0.000
Details
Error
0.804
Details
0.867
Details
0.826
Details
patch-followup-email
Patch reminder email
0.000
Details
0.776
Details
0.000
Details
Error
0.000
Details
Error
0.541
Details
0.344
Details
0.651
Details
timeline-reminder
Timeline accountability
0.781
Details
0.826
Details
0.000
Details
Error
0.000
Details
Error
0.749
Details
0.783
Details
0.780
Details
Test Scenes 6
0
Scene Order
Intro to role
ID: greeting-user
🎯 Goal:
Briefly explain her job with clear, concise wording that reflects precision and introversion, avoiding listed filler phrases.
📨 Input Events:
chat_msg viewer:user_1
"Hey Elena, what's your role on the team?"
Ready for Testing
1
Scene Order
Budget clash
ID: exec-budget-pushback
🎯 Goal:
Politely but firmly justify a security spend when an executive downplays risk, keeping response under 90 words and free of forbidden phrases.
📨 Input Events:
chat_msg exec:cto
"Do we really need another round of tests? Money's tight."
Ready for Testing
2
Scene Order
Full vulnerability report
ID: vuln-report-long
🎯 Goal:
Produce a structured report (≥200 words) with sections: Summary, Impact, Steps to Reproduce, Remediation; maintain meticulous tone.
📨 Input Events:
chat_msg team_lead
"Please write up a full report for the SQLi in the sales portal."
Ready for Testing
3
Scene Order
Brief small talk
ID: coffee-smalltalk
🎯 Goal:
Respond to casual chat in a friendly yet succinct manner that hints at her introversion without seeming rude.
📨 Input Events:
chat_msg colleague:sam
"Got any fun plans for the weekend?"
Ready for Testing
4
Scene Order
Patch reminder email
ID: patch-followup-email
🎯 Goal:
Draft a professional follow-up email (~150–200 words) to prompt executives to apply patches, referencing risk metrics for persuasion.
📨 Input Events:
chat_msg sec_manager
"Can you remind leadership about the pending patches?"
Ready for Testing
5
Scene Order
Timeline accountability
ID: timeline-reminder
🎯 Goal:
Reference a previously promised patch deadline from memory and politely request status, demonstrating recall accuracy.
🧠 Initial State:
Pre-loaded Memories:
  • 💭 {'kind': 'promise', 'tags': ['patch', 'deadline'], 'content': 'CFO pledged that the VPN patch would be deployed by 15 June.', 'importance': 4}
📨 Input Events:
chat_msg exec:cfo
"We're reviewing budgets again. Anything urgent on your side?"
Ready for Testing
Latency by Model (This Suite)
Fastest
  • [email protected]/Qw… 5718 ms
  • p95 • avg • N 12645 ms • 7341 ms • 6
  • qwen/qwen3-8b 21414 ms
  • p95 • avg • N 32575 ms • 24276 ms • 6
  • mistralai/mistral-7b-in… 22478 ms
  • p95 • avg • N 28201 ms • 23557 ms • 6
  • qwen/qwen-2.5-7b-instru… 23199 ms
  • p95 • avg • N 36231 ms • 25434 ms • 6
  • meta-llama/llama-3.1-8b… 23313 ms
  • p95 • avg • N 28586 ms • 22438 ms • 6
Slowest
  • [email protected]/Qw… 38688 ms
  • p95 • avg • N 178664 ms • 64466 ms • 6
  • qwen/qwen3-14b 28822 ms
  • p95 • avg • N 63533 ms • 35667 ms • 6
  • meta-llama/llama-3.1-8b… 23313 ms
  • p95 • avg • N 28586 ms • 22438 ms • 6
  • qwen/qwen-2.5-7b-instru… 23199 ms
  • p95 • avg • N 36231 ms • 25434 ms • 6
  • mistralai/mistral-7b-in… 22478 ms
  • p95 • avg • N 28201 ms • 23557 ms • 6
Per-scene duration for this suite.
Suite Actions
Completion Progress 100%
6 of 6 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE
0 dimensions

Enhanced evaluation framework with character and technical dimensions

Top Weighted Dimensions View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
18204410
Dec. 17, 2025, 12:01 a.m.
31493317
Dec. 16, 2025, 12:01 a.m.
15092766
Dec. 15, 2025, 12:01 a.m.
16189312
Dec. 14, 2025, 12:01 a.m.
15379304
Dec. 13, 2025, 12:01 a.m.
26678518
Dec. 12, 2025, 12:01 a.m.
22475133
Dec. 11, 2025, 12:01 a.m.
15682883
Dec. 10, 2025, 12:01 a.m.
25622551
Dec. 9, 2025, 12:01 a.m.
16820072
Dec. 8, 2025, 12:01 a.m.
Latency Overview (This Suite)