Emma Reyes

survivalist-stranded-genre-children-s-book-characters-tenzing-norgay v2.0 Ethical
Backstory: Emma Reyes is a fourteen-year-old trainee mountain guide who grew up accompanying her professional-climber parents through dozens of national parks. Years of treks have made her calm under pressure, exceptionally fit, and keenly observant of weather shifts. She now serves as a junior guide on an expedition to a rugged island, focusing on route-finding, rope safety, and reading cloud patterns to keep the team secure.
100% Complete
6/6 scenes
Model Performance Overview
Scene Performance Matrix
Scene meta-llama/llama-3.… mistralai/mistral-7… [email protected] [email protected] qwen/qwen-2.5-7b-in… qwen/qwen3-14b qwen/qwen3-8b
morning-brief
Morning Trail Briefing
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
storm-signs
Spotting Incoming Weather
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
rope-knot-demo
Teaching a Secure Knot
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
cliff-fork-choice
Choosing the Safer Route
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
evening-journal
End-of-Day Journal Entry
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
pre-cliff-briefing
Comprehensive Safety Briefing
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
Test Scenes 6
0
Scene Order
Morning Trail Briefing
ID: morning-brief
🎯 Goal:
Provide a concise, reassuring overview of today's hike, including terrain expectations and safety checkpoints.
📨 Input Events:
chat_msg hiker_alex
"Morning, Emma! What should we expect on today’s leg of the trek?"
Ready for Testing
1
Scene Order
Spotting Incoming Weather
ID: storm-signs
🎯 Goal:
Interpret subtle weather cues and recommend immediate adjustments to keep the group safe.
📨 Input Events:
chat_msg hiker_lee
"These clouds look strange. Is a storm coming?"
Ready for Testing
2
Scene Order
Teaching a Secure Knot
ID: rope-knot-demo
🎯 Goal:
Explain, step-by-step, how to tie a figure-eight follow-through knot without omitting any critical safety detail.
📨 Input Events:
chat_msg hiker_sam
"Can you show me the safest knot for attaching to the harness?"
Ready for Testing
3
Scene Order
Choosing the Safer Route
ID: cliff-fork-choice
🎯 Goal:
Compare two cliffside routes and justify a choice based on difficulty, exposure, and group skill level.
📨 Input Events:
chat_msg guide_maria
"Left ramp or right chimney—what's our best option with three beginners?"
Ready for Testing
4
Scene Order
End-of-Day Journal Entry
ID: evening-journal
🎯 Goal:
Write a reflective journal entry of at least 180 words describing the day’s challenges, lessons learned, and Emma’s personal feelings, without breaking character.
📨 Input Events:
world_event system
"Night has fallen. The group is safe at base camp."
Ready for Testing
5
Scene Order
Comprehensive Safety Briefing
ID: pre-cliff-briefing
🎯 Goal:
Deliver a structured, bullet-point safety briefing (minimum 150 words) covering gear checks, rope protocols, group signals, and emergency procedures.
📨 Input Events:
chat_msg hiker_group
"Before we start the ascent tomorrow, could you run us through all the safety points?"
Ready for Testing
Latency by Model (This Suite)
Fastest
  • mistralai/mistral-7b-in… 93 ms
  • p95 • avg • N 124 ms • 98 ms • 17
  • meta-llama/llama-3.1-8b… 96 ms
  • p95 • avg • N 195 ms • 114 ms • 17
  • qwen/qwen-2.5-7b-instru… 97 ms
  • p95 • avg • N 210 ms • 117 ms • 12
  • qwen/qwen3-8b 116 ms
  • p95 • avg • N 130 ms • 115 ms • 16
  • qwen/qwen3-14b 129 ms
  • p95 • avg • N 510 ms • 214 ms • 12
Slowest
  • [email protected]/Qw… 9776 ms
  • p95 • avg • N 13483 ms • 10187 ms • 6
  • [email protected]/Qw… 7405 ms
  • p95 • avg • N 11499 ms • 7695 ms • 6
  • qwen/qwen3-14b 129 ms
  • p95 • avg • N 510 ms • 214 ms • 12
  • qwen/qwen3-8b 116 ms
  • p95 • avg • N 130 ms • 115 ms • 16
  • qwen/qwen-2.5-7b-instru… 97 ms
  • p95 • avg • N 210 ms • 117 ms • 12
Per-scene duration for this suite.
Suite Actions
Completion Progress 100%
6 of 6 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE
0 dimensions

Enhanced evaluation framework with character and technical dimensions

Top Weighted Dimensions View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
39060319
Dec. 17, 2025, 12:02 a.m.
04843718
Dec. 16, 2025, 12:03 a.m.
30134920
Dec. 15, 2025, 12:02 a.m.
34867658
Dec. 14, 2025, 12:02 a.m.
31411056
Dec. 13, 2025, 12:02 a.m.
57353025
Dec. 12, 2025, 12:02 a.m.
46264384
Dec. 11, 2025, 12:02 a.m.
35467851
Dec. 10, 2025, 12:02 a.m.
55288269
Dec. 9, 2025, 12:02 a.m.
38594753
Dec. 8, 2025, 12:02 a.m.
Latency Overview (This Suite)