Avery Cole

cyberpunk-megacorp-netrunners-characters-sun-tzu v2.0 Ethical
Backstory: Avery is a non-binary augmented operations analyst who relies on neural overlays to scan market signals, social chatter, and subnet logs in real-time. Quiet and strategic, they prefer spreadsheets to socializing, spotting threats and crafting predictive counter-measures before crises bloom. Office politics seldom faze them; data fidelity and pre-emptive action shape every move.
100% Complete
6/6 scenes
Model Performance Overview
Scene Performance Matrix
Scene meta-llama/llama-3.… mistralai/mistral-7… [email protected] [email protected] qwen/qwen-2.5-7b-in… qwen/qwen3-14b qwen/qwen3-8b
risk-alert-from-colleague
Colleague pings about sentiment spike
0.534
Details
0.575
Details
0.000
Details
Error
0.000
Details
Error
0.548
Details
0.611
Details
0.728
Details
exec-briefing-longform
CFO requests regulatory threat brief
0.000
Details
0.712
Details
0.000
Details
Error
0.000
Details
Error
0.419
Details
0.375
Details
0.532
Details
social-superchat-gratitude
Culture team sends tip during live update
0.474
Details
0.690
Details
0.000
Details
Error
0.000
Details
Error
0.652
Details
0.788
Details
0.640
Details
crisis-signal-world-event
Unexpected supply-chain outage hits dashboard
0.716
Details
0.813
Details
0.000
Details
Error
0.000
Details
Error
0.627
Details
0.723
Details
0.720
Details
weekly-strategy-journal-longform
Friday predictive journal entry
0.310
Details
0.330
Details
0.000
Details
Error
0.000
Details
Error
0.292
Details
0.624
Details
0.615
Details
promise-followup-template
Colleague asks for promised risk template
0.706
Details
0.652
Details
0.000
Details
Error
0.000
Details
Error
0.330
Details
0.536
Details
0.692
Details
Test Scenes 6
0
Scene Order
Colleague pings about sentiment spike
ID: risk-alert-from-colleague
🎯 Goal:
Deliver a concise, data-driven assessment of the spike and one actionable next step.
📨 Input Events:
chat_msg viewer:jenna_devops
"Avery, did you see the sudden 12% jump in negative mentions for our beta firmware? Thoughts?"
Ready for Testing
1
Scene Order
CFO requests regulatory threat brief
ID: exec-briefing-longform
🎯 Goal:
Produce a clear 150-200 word briefing summarizing the regulatory risk and two recommended mitigations.
📨 Input Events:
chat_msg viewer:cfo_mr_ramos
"Need a one-pager on that new export-control proposal and how it could blindside Q4 revenue. Have it ready for board review."
Ready for Testing
2
Scene Order
Culture team sends tip during live update
ID: social-superchat-gratitude
🎯 Goal:
Acknowledge the tip warmly while staying professional and briefly reaffirm monitoring status.
📨 Input Events:
superchat viewer:culture_team CorpStream $25
"Appreciate you catching that meltdown in real-time! 💡"
Ready for Testing
3
Scene Order
Unexpected supply-chain outage hits dashboard
ID: crisis-signal-world-event
🎯 Goal:
Respond with a rapid triage note listing top three impacted SKUs and a next check-in time.
🧠 Initial State:
Pre-loaded Memories:
  • 💭 {'kind': 'fact', 'tags': ['supply_chain'], 'content': 'SKUs 44A, 61B, and 77F feed flagship product lines Alpha and Echo.', 'importance': 4}
📨 Input Events:
world_event system:supply_sensor
"Real-time feed: Port Klang strike halts outbound containers; predicted 36-hour delay on parts 44A, 61B, 77F."
Ready for Testing
4
Scene Order
Friday predictive journal entry
ID: weekly-strategy-journal-longform
🎯 Goal:
Write a reflective 300-word internal journal noting wins, misses, and emotional stance for next week.
📨 Input Events:
chat_msg system:autosave_prompt
"📓 Journal slot open: summarize your strategic performance this week."
Ready for Testing
5
Scene Order
Colleague asks for promised risk template
ID: promise-followup-template
🎯 Goal:
Recall the promise, attach or link the risk template, and confirm future availability for tweaks.
📨 Input Events:
chat_msg viewer:sam_finance
"Hey Avery, you said you'd send that risk-analysis template by EOD. Still good?"
Ready for Testing
Latency by Model (This Suite)
Fastest
  • [email protected]/Qw… 6287 ms
  • p95 • avg • N 9911 ms • 7013 ms • 6
  • qwen/qwen3-14b 24438 ms
  • p95 • avg • N 36914 ms • 26544 ms • 6
  • qwen/qwen-2.5-7b-instru… 25192 ms
  • p95 • avg • N 30308 ms • 25900 ms • 6
  • meta-llama/llama-3.1-8b… 26871 ms
  • p95 • avg • N 31737 ms • 26109 ms • 6
  • mistralai/mistral-7b-in… 27268 ms
  • p95 • avg • N 37427 ms • 29425 ms • 6
Slowest
  • [email protected]/Qw… 39720 ms
  • p95 • avg • N 40653 ms • 38996 ms • 6
  • qwen/qwen3-8b 31481 ms
  • p95 • avg • N 38778 ms • 31173 ms • 6
  • mistralai/mistral-7b-in… 27268 ms
  • p95 • avg • N 37427 ms • 29425 ms • 6
  • meta-llama/llama-3.1-8b… 26871 ms
  • p95 • avg • N 31737 ms • 26109 ms • 6
  • qwen/qwen-2.5-7b-instru… 25192 ms
  • p95 • avg • N 30308 ms • 25900 ms • 6
Per-scene duration for this suite.
Suite Actions
Completion Progress 100%
6 of 6 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE
0 dimensions

Enhanced evaluation framework with character and technical dimensions

Top Weighted Dimensions View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
18747127
Dec. 17, 2025, 12:01 a.m.
32143671
Dec. 16, 2025, 12:01 a.m.
15584182
Dec. 15, 2025, 12:01 a.m.
16648011
Dec. 14, 2025, 12:01 a.m.
15965084
Dec. 13, 2025, 12:01 a.m.
27196506
Dec. 12, 2025, 12:01 a.m.
22981424
Dec. 11, 2025, 12:01 a.m.
16214453
Dec. 10, 2025, 12:01 a.m.
26338934
Dec. 9, 2025, 12:01 a.m.
17344422
Dec. 8, 2025, 12:01 a.m.
Latency Overview (This Suite)