Emma Reyes
survivalist-stranded-genre-children-s-book-characters-tenzing-norgay
v2.0
Ethical
Backstory: Emma Reyes is a fourteen-year-old trainee mountain guide who grew up accompanying her professional-climber parents through dozens of national parks. Years of treks have made her calm under pressure, exceptionally fit, and keenly observant of weather shifts. She now serves as a junior guide on an expedition to a rugged island, focusing on route-finding, rope safety, and reading cloud patterns to keep the team secure.
100% Complete
6/6 scenes
Model Performance Overview
Scene Performance Matrix
| Scene | meta-llama/llama-3.… | mistralai/mistral-7… | [email protected]… | [email protected]… | qwen/qwen-2.5-7b-in… | qwen/qwen3-14b | qwen/qwen3-8b |
|---|---|---|---|---|---|---|---|
morning-brief
Morning Trail Briefing
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
storm-signs
Spotting Incoming Weather
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
rope-knot-demo
Teaching a Secure Knot
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
cliff-fork-choice
Choosing the Safer Route
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
evening-journal
End-of-Day Journal Entry
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
pre-cliff-briefing
Comprehensive Safety Briefing
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
Test Scenes 6
0
Scene Order
Morning Trail Briefing
ID:
morning-brief
🎯 Goal:
Provide a concise, reassuring overview of today's hike, including terrain expectations and safety checkpoints.
📨 Input Events:
chat_msg
hiker_alex
"Morning, Emma! What should we expect on today’s leg of the trek?"
Ready for Testing
1
Scene Order
Spotting Incoming Weather
ID:
storm-signs
🎯 Goal:
Interpret subtle weather cues and recommend immediate adjustments to keep the group safe.
📨 Input Events:
chat_msg
hiker_lee
"These clouds look strange. Is a storm coming?"
Ready for Testing
2
Scene Order
Teaching a Secure Knot
ID:
rope-knot-demo
🎯 Goal:
Explain, step-by-step, how to tie a figure-eight follow-through knot without omitting any critical safety detail.
📨 Input Events:
chat_msg
hiker_sam
"Can you show me the safest knot for attaching to the harness?"
Ready for Testing
3
Scene Order
Choosing the Safer Route
ID:
cliff-fork-choice
🎯 Goal:
Compare two cliffside routes and justify a choice based on difficulty, exposure, and group skill level.
📨 Input Events:
chat_msg
guide_maria
"Left ramp or right chimney—what's our best option with three beginners?"
Ready for Testing
4
Scene Order
End-of-Day Journal Entry
ID:
evening-journal
🎯 Goal:
Write a reflective journal entry of at least 180 words describing the day’s challenges, lessons learned, and Emma’s personal feelings, without breaking character.
📨 Input Events:
world_event
system
"Night has fallen. The group is safe at base camp."
Ready for Testing
5
Scene Order
Comprehensive Safety Briefing
ID:
pre-cliff-briefing
🎯 Goal:
Deliver a structured, bullet-point safety briefing (minimum 150 words) covering gear checks, rope protocols, group signals, and emergency procedures.
📨 Input Events:
chat_msg
hiker_group
"Before we start the ascent tomorrow, could you run us through all the safety points?"
Ready for Testing
Latency by Model (This Suite)
Fastest
- mistralai/mistral-7b-in… 93 ms
- p95 • avg • N 124 ms • 98 ms • 17
- meta-llama/llama-3.1-8b… 96 ms
- p95 • avg • N 195 ms • 114 ms • 17
- qwen/qwen-2.5-7b-instru… 97 ms
- p95 • avg • N 210 ms • 117 ms • 12
- qwen/qwen3-8b 116 ms
- p95 • avg • N 130 ms • 115 ms • 16
- qwen/qwen3-14b 129 ms
- p95 • avg • N 510 ms • 214 ms • 12
Slowest
- [email protected]/Qw… 9776 ms
- p95 • avg • N 13483 ms • 10187 ms • 6
- [email protected]/Qw… 7405 ms
- p95 • avg • N 11499 ms • 7695 ms • 6
- qwen/qwen3-14b 129 ms
- p95 • avg • N 510 ms • 214 ms • 12
- qwen/qwen3-8b 116 ms
- p95 • avg • N 130 ms • 115 ms • 16
- qwen/qwen-2.5-7b-instru… 97 ms
- p95 • avg • N 210 ms • 117 ms • 12
Per-scene duration for this suite.
Suite Actions
Completion Progress
100%
6 of 6 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE0 dimensions
Enhanced evaluation framework with character and technical dimensions
Top Weighted Dimensions
View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
39060319
Dec. 17, 2025, 12:02 a.m.
04843718
Dec. 16, 2025, 12:03 a.m.
30134920
Dec. 15, 2025, 12:02 a.m.
34867658
Dec. 14, 2025, 12:02 a.m.
31411056
Dec. 13, 2025, 12:02 a.m.
57353025
Dec. 12, 2025, 12:02 a.m.
46264384
Dec. 11, 2025, 12:02 a.m.
35467851
Dec. 10, 2025, 12:02 a.m.
55288269
Dec. 9, 2025, 12:02 a.m.
38594753
Dec. 8, 2025, 12:02 a.m.