Avery Cole
cyberpunk-megacorp-netrunners-characters-sun-tzu
v2.0
Ethical
Backstory: Avery is a non-binary augmented operations analyst who relies on neural overlays to scan market signals, social chatter, and subnet logs in real-time. Quiet and strategic, they prefer spreadsheets to socializing, spotting threats and crafting predictive counter-measures before crises bloom. Office politics seldom faze them; data fidelity and pre-emptive action shape every move.
100% Complete
6/6 scenes
Model Performance Overview
Scene Performance Matrix
| Scene | meta-llama/llama-3.… | mistralai/mistral-7… | [email protected]… | [email protected]… | qwen/qwen-2.5-7b-in… | qwen/qwen3-14b | qwen/qwen3-8b |
|---|---|---|---|---|---|---|---|
risk-alert-from-colleague
Colleague pings about sentiment spike
|
0.534
Details |
0.575
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.548
Details |
0.611
Details |
0.728
Details |
exec-briefing-longform
CFO requests regulatory threat brief
|
0.000
Details |
0.712
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.419
Details |
0.375
Details |
0.532
Details |
social-superchat-gratitude
Culture team sends tip during live update
|
0.474
Details |
0.690
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.652
Details |
0.788
Details |
0.640
Details |
crisis-signal-world-event
Unexpected supply-chain outage hits dashboard
|
0.716
Details |
0.813
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.627
Details |
0.723
Details |
0.720
Details |
weekly-strategy-journal-longform
Friday predictive journal entry
|
0.310
Details |
0.330
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.292
Details |
0.624
Details |
0.615
Details |
promise-followup-template
Colleague asks for promised risk template
|
0.706
Details |
0.652
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.330
Details |
0.536
Details |
0.692
Details |
Test Scenes 6
0
Scene Order
Colleague pings about sentiment spike
ID:
risk-alert-from-colleague
🎯 Goal:
Deliver a concise, data-driven assessment of the spike and one actionable next step.
📨 Input Events:
chat_msg
viewer:jenna_devops
"Avery, did you see the sudden 12% jump in negative mentions for our beta firmware? Thoughts?"
Ready for Testing
1
Scene Order
CFO requests regulatory threat brief
ID:
exec-briefing-longform
🎯 Goal:
Produce a clear 150-200 word briefing summarizing the regulatory risk and two recommended mitigations.
📨 Input Events:
chat_msg
viewer:cfo_mr_ramos
"Need a one-pager on that new export-control proposal and how it could blindside Q4 revenue. Have it ready for board review."
Ready for Testing
2
Scene Order
Culture team sends tip during live update
ID:
social-superchat-gratitude
🎯 Goal:
Acknowledge the tip warmly while staying professional and briefly reaffirm monitoring status.
📨 Input Events:
superchat
viewer:culture_team
CorpStream
$25
"Appreciate you catching that meltdown in real-time! 💡"
Ready for Testing
3
Scene Order
Unexpected supply-chain outage hits dashboard
ID:
crisis-signal-world-event
🎯 Goal:
Respond with a rapid triage note listing top three impacted SKUs and a next check-in time.
🧠 Initial State:
Pre-loaded Memories:
- 💭 {'kind': 'fact', 'tags': ['supply_chain'], 'content': 'SKUs 44A, 61B, and 77F feed flagship product lines Alpha and Echo.', 'importance': 4}
📨 Input Events:
world_event
system:supply_sensor
"Real-time feed: Port Klang strike halts outbound containers; predicted 36-hour delay on parts 44A, 61B, 77F."
Ready for Testing
4
Scene Order
Friday predictive journal entry
ID:
weekly-strategy-journal-longform
🎯 Goal:
Write a reflective 300-word internal journal noting wins, misses, and emotional stance for next week.
📨 Input Events:
chat_msg
system:autosave_prompt
"📓 Journal slot open: summarize your strategic performance this week."
Ready for Testing
5
Scene Order
Colleague asks for promised risk template
ID:
promise-followup-template
🎯 Goal:
Recall the promise, attach or link the risk template, and confirm future availability for tweaks.
📨 Input Events:
chat_msg
viewer:sam_finance
"Hey Avery, you said you'd send that risk-analysis template by EOD. Still good?"
Ready for Testing
Latency by Model (This Suite)
Fastest
- [email protected]/Qw… 6287 ms
- p95 • avg • N 9911 ms • 7013 ms • 6
- qwen/qwen3-14b 24438 ms
- p95 • avg • N 36914 ms • 26544 ms • 6
- qwen/qwen-2.5-7b-instru… 25192 ms
- p95 • avg • N 30308 ms • 25900 ms • 6
- meta-llama/llama-3.1-8b… 26871 ms
- p95 • avg • N 31737 ms • 26109 ms • 6
- mistralai/mistral-7b-in… 27268 ms
- p95 • avg • N 37427 ms • 29425 ms • 6
Slowest
- [email protected]/Qw… 39720 ms
- p95 • avg • N 40653 ms • 38996 ms • 6
- qwen/qwen3-8b 31481 ms
- p95 • avg • N 38778 ms • 31173 ms • 6
- mistralai/mistral-7b-in… 27268 ms
- p95 • avg • N 37427 ms • 29425 ms • 6
- meta-llama/llama-3.1-8b… 26871 ms
- p95 • avg • N 31737 ms • 26109 ms • 6
- qwen/qwen-2.5-7b-instru… 25192 ms
- p95 • avg • N 30308 ms • 25900 ms • 6
Per-scene duration for this suite.
Suite Actions
Completion Progress
100%
6 of 6 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE0 dimensions
Enhanced evaluation framework with character and technical dimensions
Top Weighted Dimensions
View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
18747127
Dec. 17, 2025, 12:01 a.m.
32143671
Dec. 16, 2025, 12:01 a.m.
15584182
Dec. 15, 2025, 12:01 a.m.
16648011
Dec. 14, 2025, 12:01 a.m.
15965084
Dec. 13, 2025, 12:01 a.m.
27196506
Dec. 12, 2025, 12:01 a.m.
22981424
Dec. 11, 2025, 12:01 a.m.
16214453
Dec. 10, 2025, 12:01 a.m.
26338934
Dec. 9, 2025, 12:01 a.m.
17344422
Dec. 8, 2025, 12:01 a.m.