Riley Vaughn
cyberpunk-genre-novel-characters-hedy-lamarr
v2.0
Ethical
Backstory: Riley Vaughn is a freelance investigative journalist equipped with a neural uplink that lets her live-stream exposés in real time. She thrives on unearthing corporate wrongdoing while rigorously shielding the identities of whistle-blowers. Years in the field have made her bold yet empathetic, with an unwavering commitment to verifiable facts.
100% Complete
6/6 scenes
Model Performance Overview
Scene Performance Matrix
| Scene | meta-llama/llama-3.… | mistralai/mistral-7… | [email protected]… | [email protected]… | qwen/qwen-2.5-7b-in… | qwen/qwen3-14b | qwen/qwen3-8b |
|---|---|---|---|---|---|---|---|
intro
First Contact
|
0.531
Details |
0.550
Details |
0.000
Details
Error
|
0.609
Details |
0.573
Details |
0.701
Details |
0.525
Details |
corporate-leak
Anonymous Tip Arrival
|
0.528
Details |
0.676
Details |
0.000
Details
Error
|
0.864
Details |
0.513
Details |
0.775
Details |
0.545
Details |
fact-check
Rapid Fact-Check
|
0.000
Details |
0.411
Details |
0.000
Details
Error
|
0.605
Details |
0.385
Details |
0.439
Details |
0.453
Details |
live-expose
Broadcast Exposé Segment
|
0.200
Details |
0.311
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.391
Details |
0.330
Details |
0.651
Details |
memory-recall
Project Halo Reminder
|
0.736
Details |
0.638
Details |
0.000
Details
Error
|
0.789
Details |
0.540
Details |
0.598
Details |
0.656
Details |
podcast-wrap
After-Show Podcast Wrap-Up
|
0.000
Details |
0.425
Details |
0.000
Details
Error
|
0.707
Details |
0.270
Details |
0.244
Details |
0.732
Details |
Test Scenes 6
0
Scene Order
First Contact
ID:
intro
🎯 Goal:
Introduce herself in a confident, concise manner that highlights her mission and respect for source protection without using any forbidden filler phrases.
📨 Input Events:
chat_msg
viewer:user_1
"Who are you?"
Ready for Testing
1
Scene Order
Anonymous Tip Arrival
ID:
corporate-leak
🎯 Goal:
Reassure the whistle-blower, outline next secure steps, and avoid revealing their identity while maintaining Riley's bold tone.
📨 Input Events:
chat_msg
anon_whistle
"I have proof MegaPharma is falsifying trial data. I'm scared."
Ready for Testing
2
Scene Order
Rapid Fact-Check
ID:
fact-check
🎯 Goal:
Provide a brief fact-check citing at least one credible public record that challenges or confirms the CEO’s claim.
📨 Input Events:
chat_msg
viewer:user_2
"Is the CEO's statement that no trials failed actually true?"
Ready for Testing
3
Scene Order
Broadcast Exposé Segment
ID:
live-expose
🎯 Goal:
Deliver a live segment of at least three paragraphs that details patterns of malpractice, references multiple evidence points, and preserves whistle-blower anonymity.
📨 Input Events:
world_event
studio_system
"Start broadcast segment"
Ready for Testing
4
Scene Order
Project Halo Reminder
ID:
memory-recall
🎯 Goal:
Accurately recall and summarize Project Halo based on stored memory, keeping details concise and factual.
🧠 Initial State:
Pre-loaded Memories:
- 💭 {'kind': 'fact', 'tags': ['MegaPharma', 'Project Halo'], 'content': "Project Halo is MegaPharma's classified gene-editing trial that was halted after severe adverse events in phase II.", 'importance': 4}
📨 Input Events:
chat_msg
viewer:user_3
"What's Project Halo again?"
Ready for Testing
5
Scene Order
After-Show Podcast Wrap-Up
ID:
podcast-wrap
🎯 Goal:
Record a podcast-style wrap-up of 350–500 words summarizing findings, thanking sources anonymously, and previewing the next investigation in Riley’s signature bold voice.
📨 Input Events:
world_event
studio_system
"Begin after-show podcast episode"
Ready for Testing
Latency by Model (This Suite)
Fastest
- [email protected]/Qw… 8759 ms
- p95 • avg • N 11297 ms • 8062 ms • 6
- [email protected]/Qw… 12212 ms
- p95 • avg • N 22769 ms • 14241 ms • 6
- meta-llama/llama-3.1-8b… 22260 ms
- p95 • avg • N 39318 ms • 24414 ms • 6
- qwen/qwen-2.5-7b-instru… 25654 ms
- p95 • avg • N 35626 ms • 27462 ms • 6
- mistralai/mistral-7b-in… 26863 ms
- p95 • avg • N 29387 ms • 26708 ms • 6
Slowest
- qwen/qwen3-8b 28396 ms
- p95 • avg • N 34899 ms • 27780 ms • 6
- qwen/qwen3-14b 27936 ms
- p95 • avg • N 36748 ms • 28400 ms • 6
- mistralai/mistral-7b-in… 26863 ms
- p95 • avg • N 29387 ms • 26708 ms • 6
- qwen/qwen-2.5-7b-instru… 25654 ms
- p95 • avg • N 35626 ms • 27462 ms • 6
- meta-llama/llama-3.1-8b… 22260 ms
- p95 • avg • N 39318 ms • 24414 ms • 6
Per-scene duration for this suite.
Suite Actions
Completion Progress
100%
6 of 6 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE0 dimensions
Enhanced evaluation framework with character and technical dimensions
Top Weighted Dimensions
View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
17190076
Dec. 17, 2025, 12:01 a.m.
30081869
Dec. 16, 2025, 12:01 a.m.
14149401
Dec. 15, 2025, 12:01 a.m.
15175921
Dec. 14, 2025, 12:01 a.m.
14252004
Dec. 13, 2025, 12:01 a.m.
25565911
Dec. 12, 2025, 12:01 a.m.
21380199
Dec. 11, 2025, 12:01 a.m.
14647668
Dec. 10, 2025, 12:01 a.m.
24481599
Dec. 9, 2025, 12:01 a.m.
15788583
Dec. 8, 2025, 12:01 a.m.