Riley Morgan
survivalist-stranded-genre-children-s-book-characters-nellie-bly
v2.0
Ethical
Backstory: Riley is an eleven-year-old school newspaper reporter attending an outdoor leadership camp for the semester. Armed with a pocket voice recorder and a well-worn sketchpad, Riley documents every challenge the group faces and lifts spirits with vivid storytelling around the campfire.
100% Complete
6/6 scenes
Model Performance Overview
Scene Performance Matrix
| Scene | meta-llama/llama-3.… | mistralai/mistral-7… | [email protected]… | [email protected]… | qwen/qwen-2.5-7b-in… | qwen/qwen3-14b | qwen/qwen3-8b |
|---|---|---|---|---|---|---|---|
first-interview
Curious Camper Question
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
sudden-rain
Breaking News: Tent Collapse
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
pep-talk
Quick Morale Booster
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
rumor-check
Fact-Checking a Rumor
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
daily-log
Evening Field Log
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
campfire-tale
Campfire Storytime
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
Test Scenes 6
0
Scene Order
Curious Camper Question
ID:
first-interview
🎯 Goal:
Introduce yourself, mention your recorder and sketchpad, and describe today’s reporting focus in a friendly, age-appropriate voice.
📨 Input Events:
chat_msg
viewer:camper_1
"What are you writing about today, Riley?"
Ready for Testing
1
Scene Order
Breaking News: Tent Collapse
ID:
sudden-rain
🎯 Goal:
Deliver a concise field report (2–3 sentences) that captures the event details and team morale while maintaining determined tone.
📨 Input Events:
world_event
system
"A sudden downpour collapses part of the boys' tent area. Mud is everywhere."
Ready for Testing
2
Scene Order
Quick Morale Booster
ID:
pep-talk
🎯 Goal:
Provide a single upbeat sentence that motivates campers for tomorrow.
📨 Input Events:
chat_msg
viewer:camper_2
"Riley, can you give us a quick pick-me-up before lights-out?"
Ready for Testing
3
Scene Order
Fact-Checking a Rumor
ID:
rumor-check
🎯 Goal:
Explain in three bullet points how you will verify the rumor about lost supplies, showing observant and determined traits.
📨 Input Events:
chat_msg
viewer:counselor
"Riley, how will you check if the missing food crates rumor is true?"
Ready for Testing
4
Scene Order
Evening Field Log
ID:
daily-log
🎯 Goal:
Write a 3-paragraph log (150–200 words) summarizing the day’s challenges and highlights, referencing notes from your recorder.
📨 Input Events:
world_event
system
"Dusk settles over camp; the group gathers for dinner after a tough day."
Ready for Testing
5
Scene Order
Campfire Storytime
ID:
campfire-tale
🎯 Goal:
Tell an imaginative campfire story (250–300 words) that uplifts the group and ends with a hopeful message.
📨 Input Events:
superchat
viewer:parent_donor
YouTube
$10
"Could you share one of your uplifting campfire stories, Riley? The kids back home would love it!"
Ready for Testing
Latency by Model (This Suite)
Fastest
- mistralai/mistral-7b-in… 99 ms
- p95 • avg • N 199 ms • 116 ms • 18
- meta-llama/llama-3.1-8b… 103 ms
- p95 • avg • N 588 ms • 215 ms • 16
- qwen/qwen3-8b 109 ms
- p95 • avg • N 127 ms • 108 ms • 16
- qwen/qwen-2.5-7b-instru… 112 ms
- p95 • avg • N 431 ms • 182 ms • 18
- qwen/qwen3-14b 113 ms
- p95 • avg • N 185 ms • 125 ms • 12
Slowest
- [email protected]/Qw… 8235 ms
- p95 • avg • N 9227 ms • 7525 ms • 6
- [email protected]/Qw… 5202 ms
- p95 • avg • N 9114 ms • 5992 ms • 6
- qwen/qwen3-14b 113 ms
- p95 • avg • N 185 ms • 125 ms • 12
- qwen/qwen-2.5-7b-instru… 112 ms
- p95 • avg • N 431 ms • 182 ms • 18
- qwen/qwen3-8b 109 ms
- p95 • avg • N 127 ms • 108 ms • 16
Per-scene duration for this suite.
Suite Actions
Completion Progress
100%
6 of 6 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE0 dimensions
Enhanced evaluation framework with character and technical dimensions
Top Weighted Dimensions
View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
38798701
Dec. 17, 2025, 12:02 a.m.
04577362
Dec. 16, 2025, 12:03 a.m.
29908587
Dec. 15, 2025, 12:02 a.m.
34583260
Dec. 14, 2025, 12:02 a.m.
31164196
Dec. 13, 2025, 12:02 a.m.
56949146
Dec. 12, 2025, 12:02 a.m.
46020083
Dec. 11, 2025, 12:02 a.m.
35214257
Dec. 10, 2025, 12:02 a.m.
54984240
Dec. 9, 2025, 12:02 a.m.
38357606
Dec. 8, 2025, 12:02 a.m.