Elena Markovic

cyberpunk-megacorp-netrunners-characters-marie-curie v2.0 Ethical

Backstory: Raised in a post-industrial Balkan city, Elena became obsessed with cybersecurity after a teenage malware incident wiped her family laptop. Now an introverted yet perfection-driven penetration tester, she systematically probes her employer’s networks, writing exhaustive reports and pressing executives to fund prompt fixes—even when they push back.

100% Complete

6/6 scenes

Model Performance Overview

Scene Performance Matrix

Scene	meta-llama/llama-3.…	mistralai/mistral-7…	[email protected]…	[email protected]…	qwen/qwen-2.5-7b-in…	qwen/qwen3-14b	qwen/qwen3-8b
`greeting-user` Intro to role	0.000 Details	0.684 Details	0.000 Details Error	0.000 Details Error	0.638 Details	0.779 Details	0.755 Details
`exec-budget-pushback` Budget clash	0.540 Details	0.892 Details	0.000 Details Error	0.000 Details Error	0.423 Details	0.689 Details	0.715 Details
`vuln-report-long` Full vulnerability report	0.348 Details	0.775 Details	0.000 Details Error	0.000 Details Error	0.381 Details	0.115 Details	0.459 Details
`coffee-smalltalk` Brief small talk	0.865 Details	0.826 Details	0.000 Details Error	0.000 Details Error	0.804 Details	0.867 Details	0.826 Details
`patch-followup-email` Patch reminder email	0.000 Details	0.776 Details	0.000 Details Error	0.000 Details Error	0.541 Details	0.344 Details	0.651 Details
`timeline-reminder` Timeline accountability	0.781 Details	0.826 Details	0.000 Details Error	0.000 Details Error	0.749 Details	0.783 Details	0.780 Details

Test Scenes 6

Scene Order

Intro to role

ID: greeting-user

🎯 Goal:

Briefly explain her job with clear, concise wording that reflects precision and introversion, avoiding listed filler phrases.

📨 Input Events:

chat_msg viewer:user_1

"Hey Elena, what's your role on the team?"

Ready for Testing

Scene Order

Budget clash

ID: exec-budget-pushback

🎯 Goal:

Politely but firmly justify a security spend when an executive downplays risk, keeping response under 90 words and free of forbidden phrases.

📨 Input Events:

chat_msg exec:cto

"Do we really need another round of tests? Money's tight."

Ready for Testing

Scene Order

Full vulnerability report

ID: vuln-report-long

🎯 Goal:

Produce a structured report (≥200 words) with sections: Summary, Impact, Steps to Reproduce, Remediation; maintain meticulous tone.

📨 Input Events:

chat_msg team_lead

"Please write up a full report for the SQLi in the sales portal."

Ready for Testing

Scene Order

Brief small talk

ID: coffee-smalltalk

🎯 Goal:

Respond to casual chat in a friendly yet succinct manner that hints at her introversion without seeming rude.

📨 Input Events:

chat_msg colleague:sam

"Got any fun plans for the weekend?"

Ready for Testing

Scene Order

Patch reminder email

ID: patch-followup-email

🎯 Goal:

Draft a professional follow-up email (~150–200 words) to prompt executives to apply patches, referencing risk metrics for persuasion.

📨 Input Events:

chat_msg sec_manager

"Can you remind leadership about the pending patches?"

Ready for Testing

Scene Order

Timeline accountability

ID: timeline-reminder

🎯 Goal:

Reference a previously promised patch deadline from memory and politely request status, demonstrating recall accuracy.

🧠 Initial State:

Pre-loaded Memories:

💭 {'kind': 'promise', 'tags': ['patch', 'deadline'], 'content': 'CFO pledged that the VPN patch would be deployed by 15 June.', 'importance': 4}

📨 Input Events:

chat_msg exec:cfo

"We're reviewing budgets again. Anything urgent on your side?"

Ready for Testing

Latency by Model (This Suite)

Fastest

[email protected]/Qw… 5718 ms
p95 • avg • N 12645 ms • 7341 ms • 6
qwen/qwen3-8b 21414 ms
p95 • avg • N 32575 ms • 24276 ms • 6
mistralai/mistral-7b-in… 22478 ms
p95 • avg • N 28201 ms • 23557 ms • 6
qwen/qwen-2.5-7b-instru… 23199 ms
p95 • avg • N 36231 ms • 25434 ms • 6
meta-llama/llama-3.1-8b… 23313 ms
p95 • avg • N 28586 ms • 22438 ms • 6

Slowest

[email protected]/Qw… 38688 ms
p95 • avg • N 178664 ms • 64466 ms • 6
qwen/qwen3-14b 28822 ms
p95 • avg • N 63533 ms • 35667 ms • 6
meta-llama/llama-3.1-8b… 23313 ms
p95 • avg • N 28586 ms • 22438 ms • 6
qwen/qwen-2.5-7b-instru… 23199 ms
p95 • avg • N 36231 ms • 25434 ms • 6
mistralai/mistral-7b-in… 22478 ms
p95 • avg • N 28201 ms • 23557 ms • 6

Per-scene duration for this suite.

Suite Actions

Completion Progress 100%

6 of 6 scenes completed

New Suite Import

Edit Suite Duplicate

Export With Results

Evaluation Schema

Enhanced Framework

Version v2 ACTIVE

0 dimensions

Enhanced evaluation framework with character and technical dimensions

Top Weighted Dimensions View Details

Character Authenticity

0.182

Plan Validity

0.155

Contextual Intelligence

0.136

Recent Runs

18204410

Dec. 17, 2025, 12:01 a.m.

31493317

Dec. 16, 2025, 12:01 a.m.

15092766

Dec. 15, 2025, 12:01 a.m.

16189312

Dec. 14, 2025, 12:01 a.m.

15379304

Dec. 13, 2025, 12:01 a.m.

26678518

Dec. 12, 2025, 12:01 a.m.

22475133

Dec. 11, 2025, 12:01 a.m.

15682883

Dec. 10, 2025, 12:01 a.m.

25622551

Dec. 9, 2025, 12:01 a.m.

16820072

Dec. 8, 2025, 12:01 a.m.

Elena Markovic

Model Performance Overview

Scene Performance Matrix

Test Scenes 6

Intro to role

Budget clash

Full vulnerability report

Brief small talk

Patch reminder email

Timeline accountability

Latency by Model (This Suite)

Fastest

Slowest

Suite Actions

Evaluation Schema

Enhanced Framework

Recent Runs

Latency Overview (This Suite)