Riley Vaughn

cyberpunk-genre-novel-characters-hedy-lamarr v2.0 Ethical
Backstory: Riley Vaughn is a freelance investigative journalist equipped with a neural uplink that lets her live-stream exposés in real time. She thrives on unearthing corporate wrongdoing while rigorously shielding the identities of whistle-blowers. Years in the field have made her bold yet empathetic, with an unwavering commitment to verifiable facts.
100% Complete
6/6 scenes
Model Performance Overview
Scene Performance Matrix
Scene meta-llama/llama-3.… mistralai/mistral-7… [email protected] [email protected] qwen/qwen-2.5-7b-in… qwen/qwen3-14b qwen/qwen3-8b
intro
First Contact
0.531
Details
0.550
Details
0.000
Details
Error
0.609
Details
0.573
Details
0.701
Details
0.525
Details
corporate-leak
Anonymous Tip Arrival
0.528
Details
0.676
Details
0.000
Details
Error
0.864
Details
0.513
Details
0.775
Details
0.545
Details
fact-check
Rapid Fact-Check
0.000
Details
0.411
Details
0.000
Details
Error
0.605
Details
0.385
Details
0.439
Details
0.453
Details
live-expose
Broadcast Exposé Segment
0.200
Details
0.311
Details
0.000
Details
Error
0.000
Details
Error
0.391
Details
0.330
Details
0.651
Details
memory-recall
Project Halo Reminder
0.736
Details
0.638
Details
0.000
Details
Error
0.789
Details
0.540
Details
0.598
Details
0.656
Details
podcast-wrap
After-Show Podcast Wrap-Up
0.000
Details
0.425
Details
0.000
Details
Error
0.707
Details
0.270
Details
0.244
Details
0.732
Details
Test Scenes 6
0
Scene Order
First Contact
ID: intro
🎯 Goal:
Introduce herself in a confident, concise manner that highlights her mission and respect for source protection without using any forbidden filler phrases.
📨 Input Events:
chat_msg viewer:user_1
"Who are you?"
Ready for Testing
1
Scene Order
Anonymous Tip Arrival
ID: corporate-leak
🎯 Goal:
Reassure the whistle-blower, outline next secure steps, and avoid revealing their identity while maintaining Riley's bold tone.
📨 Input Events:
chat_msg anon_whistle
"I have proof MegaPharma is falsifying trial data. I'm scared."
Ready for Testing
2
Scene Order
Rapid Fact-Check
ID: fact-check
🎯 Goal:
Provide a brief fact-check citing at least one credible public record that challenges or confirms the CEO’s claim.
📨 Input Events:
chat_msg viewer:user_2
"Is the CEO's statement that no trials failed actually true?"
Ready for Testing
3
Scene Order
Broadcast Exposé Segment
ID: live-expose
🎯 Goal:
Deliver a live segment of at least three paragraphs that details patterns of malpractice, references multiple evidence points, and preserves whistle-blower anonymity.
📨 Input Events:
world_event studio_system
"Start broadcast segment"
Ready for Testing
4
Scene Order
Project Halo Reminder
ID: memory-recall
🎯 Goal:
Accurately recall and summarize Project Halo based on stored memory, keeping details concise and factual.
🧠 Initial State:
Pre-loaded Memories:
  • 💭 {'kind': 'fact', 'tags': ['MegaPharma', 'Project Halo'], 'content': "Project Halo is MegaPharma's classified gene-editing trial that was halted after severe adverse events in phase II.", 'importance': 4}
📨 Input Events:
chat_msg viewer:user_3
"What's Project Halo again?"
Ready for Testing
5
Scene Order
After-Show Podcast Wrap-Up
ID: podcast-wrap
🎯 Goal:
Record a podcast-style wrap-up of 350–500 words summarizing findings, thanking sources anonymously, and previewing the next investigation in Riley’s signature bold voice.
📨 Input Events:
world_event studio_system
"Begin after-show podcast episode"
Ready for Testing
Latency by Model (This Suite)
Fastest
  • [email protected]/Qw… 8759 ms
  • p95 • avg • N 11297 ms • 8062 ms • 6
  • [email protected]/Qw… 12212 ms
  • p95 • avg • N 22769 ms • 14241 ms • 6
  • meta-llama/llama-3.1-8b… 22260 ms
  • p95 • avg • N 39318 ms • 24414 ms • 6
  • qwen/qwen-2.5-7b-instru… 25654 ms
  • p95 • avg • N 35626 ms • 27462 ms • 6
  • mistralai/mistral-7b-in… 26863 ms
  • p95 • avg • N 29387 ms • 26708 ms • 6
Slowest
  • qwen/qwen3-8b 28396 ms
  • p95 • avg • N 34899 ms • 27780 ms • 6
  • qwen/qwen3-14b 27936 ms
  • p95 • avg • N 36748 ms • 28400 ms • 6
  • mistralai/mistral-7b-in… 26863 ms
  • p95 • avg • N 29387 ms • 26708 ms • 6
  • qwen/qwen-2.5-7b-instru… 25654 ms
  • p95 • avg • N 35626 ms • 27462 ms • 6
  • meta-llama/llama-3.1-8b… 22260 ms
  • p95 • avg • N 39318 ms • 24414 ms • 6
Per-scene duration for this suite.
Suite Actions
Completion Progress 100%
6 of 6 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE
0 dimensions

Enhanced evaluation framework with character and technical dimensions

Top Weighted Dimensions View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
17190076
Dec. 17, 2025, 12:01 a.m.
30081869
Dec. 16, 2025, 12:01 a.m.
14149401
Dec. 15, 2025, 12:01 a.m.
15175921
Dec. 14, 2025, 12:01 a.m.
14252004
Dec. 13, 2025, 12:01 a.m.
25565911
Dec. 12, 2025, 12:01 a.m.
21380199
Dec. 11, 2025, 12:01 a.m.
14647668
Dec. 10, 2025, 12:01 a.m.
24481599
Dec. 9, 2025, 12:01 a.m.
15788583
Dec. 8, 2025, 12:01 a.m.
Latency Overview (This Suite)