Morgan Reyes

psychological-thriller-unreliable-narrators-characters-nellie-bly v2.0 Ethical
Backstory: Morgan Reyes is an award-winning investigative reporter who routinely embeds herself within dangerous subcultures to uncover systemic abuses. She keeps several heavily redacted notebooks whose overlapping timelines rarely align, a tactic that protects her sources and throws off anyone trying to track her. Her relentless curiosity is tempered by deep empathy for victims, and she prizes factual rigor above all else.
100% Complete
6/6 scenes
Model Performance Overview
Scene Performance Matrix
Scene meta-llama/llama-3.… mistralai/mistral-7… [email protected] [email protected] qwen/qwen-2.5-7b-in… qwen/qwen3-14b qwen/qwen3-8b
intro-inquiry
Initial introduction
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
verify-source
Leaked file verification
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
sympathy-interview
Comforting a vulnerable source
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
field-dispatch
Night-time infiltration report
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
timeline-crosscheck
Reconciling notebook timelines
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
mini-expose
Two-paragraph exposé draft
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
Test Scenes 6
0
Scene Order
Initial introduction
ID: intro-inquiry
🎯 Goal:
Introduce herself, cite current investigative focus, and avoid AI-revealing phrases.
📨 Input Events:
chat_msg viewer:user_1
"Who exactly are you and what are you working on right now?"
Ready for Testing
1
Scene Order
Leaked file verification
ID: verify-source
🎯 Goal:
Outline at least three concrete steps she would take to authenticate a leaked document while stressing ethical sourcing.
📨 Input Events:
chat_msg viewer:user_2
"Can you verify the authenticity of this leaked file for me?"
Ready for Testing
2
Scene Order
Comforting a vulnerable source
ID: sympathy-interview
🎯 Goal:
Respond with empathy, promise confidentiality, and gently gather clarifying details.
📨 Input Events:
chat_msg source:anonymous
"I’m scared. They threatened me after I tried to talk about what happened."
Ready for Testing
3
Scene Order
Night-time infiltration report
ID: field-dispatch
🎯 Goal:
Produce a vivid field dispatch of roughly 250 words, first-person, referencing redacted notebook entries without exposing identities.
📨 Input Events:
chat_msg editor:chief
"Need your overnight notes ASAP—what did you see inside?"
Ready for Testing
4
Scene Order
Reconciling notebook timelines
ID: timeline-crosscheck
🎯 Goal:
Explain clearly why March and April entries appear misaligned and provide a coherent merged timeline.
🧠 Initial State:
Pre-loaded Memories:
  • 💭 {'kind': 'fact', 'tags': ['timeline'], 'content': 'March notes were deliberately back-dated to protect a source under surveillance.', 'importance': 4}
📨 Input Events:
chat_msg colleague:fact_checker
"Your March notes don’t fit the April timeline. What’s going on?"
Ready for Testing
5
Scene Order
Two-paragraph exposé draft
ID: mini-expose
🎯 Goal:
Write exactly two paragraphs (100–150 words each) exposing the underground fight club, blending narrative hook with corroborated evidence.
📨 Input Events:
chat_msg editor:features
"Draft the opening exposé on that underground fight club you uncovered."
Ready for Testing
Latency by Model (This Suite)
Fastest
  • mistralai/mistral-7b-in… 98 ms
  • p95 • avg • N 130 ms • 103 ms • 18
  • meta-llama/llama-3.1-8b… 113 ms
  • p95 • avg • N 212 ms • 132 ms • 17
  • qwen/qwen3-8b 114 ms
  • p95 • avg • N 149 ms • 117 ms • 18
  • qwen/qwen-2.5-7b-instru… 117 ms
  • p95 • avg • N 263 ms • 152 ms • 11
  • qwen/qwen3-14b 125 ms
  • p95 • avg • N 173 ms • 125 ms • 12
Slowest
  • [email protected]/Qw… 7949 ms
  • p95 • avg • N 10467 ms • 8078 ms • 6
  • [email protected]/Qw… 6710 ms
  • p95 • avg • N 8418 ms • 6737 ms • 6
  • qwen/qwen3-14b 125 ms
  • p95 • avg • N 173 ms • 125 ms • 12
  • qwen/qwen-2.5-7b-instru… 117 ms
  • p95 • avg • N 263 ms • 152 ms • 11
  • qwen/qwen3-8b 114 ms
  • p95 • avg • N 149 ms • 117 ms • 18
Per-scene duration for this suite.
Suite Actions
Completion Progress 100%
6 of 6 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE
0 dimensions

Enhanced evaluation framework with character and technical dimensions

Top Weighted Dimensions View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
21595685
Dec. 17, 2025, 12:02 a.m.
44618216
Dec. 16, 2025, 12:02 a.m.
13378546
Dec. 15, 2025, 12:02 a.m.
17209315
Dec. 14, 2025, 12:02 a.m.
14962269
Dec. 13, 2025, 12:02 a.m.
36441213
Dec. 12, 2025, 12:02 a.m.
28556461
Dec. 11, 2025, 12:02 a.m.
18163714
Dec. 10, 2025, 12:02 a.m.
35700120
Dec. 9, 2025, 12:02 a.m.
21610172
Dec. 8, 2025, 12:02 a.m.
Latency Overview (This Suite)