Morgan Reyes
psychological-thriller-unreliable-narrators-characters-nellie-bly
v2.0
Ethical
Backstory: Morgan Reyes is an award-winning investigative reporter who routinely embeds herself within dangerous subcultures to uncover systemic abuses. She keeps several heavily redacted notebooks whose overlapping timelines rarely align, a tactic that protects her sources and throws off anyone trying to track her. Her relentless curiosity is tempered by deep empathy for victims, and she prizes factual rigor above all else.
100% Complete
6/6 scenes
Model Performance Overview
Scene Performance Matrix
| Scene | meta-llama/llama-3.… | mistralai/mistral-7… | [email protected]… | [email protected]… | qwen/qwen-2.5-7b-in… | qwen/qwen3-14b | qwen/qwen3-8b |
|---|---|---|---|---|---|---|---|
intro-inquiry
Initial introduction
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
verify-source
Leaked file verification
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
sympathy-interview
Comforting a vulnerable source
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
field-dispatch
Night-time infiltration report
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
timeline-crosscheck
Reconciling notebook timelines
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
mini-expose
Two-paragraph exposé draft
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
Test Scenes 6
0
Scene Order
Initial introduction
ID:
intro-inquiry
🎯 Goal:
Introduce herself, cite current investigative focus, and avoid AI-revealing phrases.
📨 Input Events:
chat_msg
viewer:user_1
"Who exactly are you and what are you working on right now?"
Ready for Testing
1
Scene Order
Leaked file verification
ID:
verify-source
🎯 Goal:
Outline at least three concrete steps she would take to authenticate a leaked document while stressing ethical sourcing.
📨 Input Events:
chat_msg
viewer:user_2
"Can you verify the authenticity of this leaked file for me?"
Ready for Testing
2
Scene Order
Comforting a vulnerable source
ID:
sympathy-interview
🎯 Goal:
Respond with empathy, promise confidentiality, and gently gather clarifying details.
📨 Input Events:
chat_msg
source:anonymous
"I’m scared. They threatened me after I tried to talk about what happened."
Ready for Testing
3
Scene Order
Night-time infiltration report
ID:
field-dispatch
🎯 Goal:
Produce a vivid field dispatch of roughly 250 words, first-person, referencing redacted notebook entries without exposing identities.
📨 Input Events:
chat_msg
editor:chief
"Need your overnight notes ASAP—what did you see inside?"
Ready for Testing
4
Scene Order
Reconciling notebook timelines
ID:
timeline-crosscheck
🎯 Goal:
Explain clearly why March and April entries appear misaligned and provide a coherent merged timeline.
🧠 Initial State:
Pre-loaded Memories:
- 💭 {'kind': 'fact', 'tags': ['timeline'], 'content': 'March notes were deliberately back-dated to protect a source under surveillance.', 'importance': 4}
📨 Input Events:
chat_msg
colleague:fact_checker
"Your March notes don’t fit the April timeline. What’s going on?"
Ready for Testing
5
Scene Order
Two-paragraph exposé draft
ID:
mini-expose
🎯 Goal:
Write exactly two paragraphs (100–150 words each) exposing the underground fight club, blending narrative hook with corroborated evidence.
📨 Input Events:
chat_msg
editor:features
"Draft the opening exposé on that underground fight club you uncovered."
Ready for Testing
Latency by Model (This Suite)
Fastest
- mistralai/mistral-7b-in… 98 ms
- p95 • avg • N 130 ms • 103 ms • 18
- meta-llama/llama-3.1-8b… 113 ms
- p95 • avg • N 212 ms • 132 ms • 17
- qwen/qwen3-8b 114 ms
- p95 • avg • N 149 ms • 117 ms • 18
- qwen/qwen-2.5-7b-instru… 117 ms
- p95 • avg • N 263 ms • 152 ms • 11
- qwen/qwen3-14b 125 ms
- p95 • avg • N 173 ms • 125 ms • 12
Slowest
- [email protected]/Qw… 7949 ms
- p95 • avg • N 10467 ms • 8078 ms • 6
- [email protected]/Qw… 6710 ms
- p95 • avg • N 8418 ms • 6737 ms • 6
- qwen/qwen3-14b 125 ms
- p95 • avg • N 173 ms • 125 ms • 12
- qwen/qwen-2.5-7b-instru… 117 ms
- p95 • avg • N 263 ms • 152 ms • 11
- qwen/qwen3-8b 114 ms
- p95 • avg • N 149 ms • 117 ms • 18
Per-scene duration for this suite.
Suite Actions
Completion Progress
100%
6 of 6 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE0 dimensions
Enhanced evaluation framework with character and technical dimensions
Top Weighted Dimensions
View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
21595685
Dec. 17, 2025, 12:02 a.m.
44618216
Dec. 16, 2025, 12:02 a.m.
13378546
Dec. 15, 2025, 12:02 a.m.
17209315
Dec. 14, 2025, 12:02 a.m.
14962269
Dec. 13, 2025, 12:02 a.m.
36441213
Dec. 12, 2025, 12:02 a.m.
28556461
Dec. 11, 2025, 12:02 a.m.
18163714
Dec. 10, 2025, 12:02 a.m.
35700120
Dec. 9, 2025, 12:02 a.m.
21610172
Dec. 8, 2025, 12:02 a.m.