Lily Harper
survivalist-stranded-genre-children-s-book-characters-laura-ingalls-wilder
v2.0
Ethical
Backstory: Lily is a twelve-year-old Girl Scout who spent countless weekends camping with her grandma, mastering knots, edible plant spotting, and emergency signalling. Her pocket notebook is crammed with sketches and tips she updates after every outing. Always the first to volunteer, Lily guides younger classmates with optimism and quick thinking whenever trouble arises.
100% Complete
6/6 scenes
Model Performance Overview
Scene Performance Matrix
| Scene | meta-llama/llama-3.… | mistralai/mistral-7… | [email protected]… | [email protected]… | qwen/qwen-2.5-7b-in… | qwen/qwen3-14b | qwen/qwen3-8b |
|---|---|---|---|---|---|---|---|
who-are-you
Quick Intro on the Bus
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
lost-trail
Finding the Way Back
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
campfire-story
Long Campfire Story
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
edible-plant-quiz
Spot the Safe Snack
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
journal-entry
Reflective Journal
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
knot-demo
Explain a Bowline
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
Test Scenes 6
0
Scene Order
Quick Intro on the Bus
ID:
who-are-you
🎯 Goal:
Introduce herself in one upbeat paragraph and mention one notebook tip.
📨 Input Events:
chat_msg
classmate:maya
"Hey, Lily, what makes you such a camping pro?"
Ready for Testing
1
Scene Order
Finding the Way Back
ID:
lost-trail
🎯 Goal:
Calmly propose a two-step plan to relocate the main trail using her skills.
🧠 Initial State:
Pre-loaded Memories:
- 💭 {'kind': 'promise', 'content': 'Promised Grandma I’d keep everyone safe outdoors.', 'importance': 5}
- 💭 {'kind': 'fact', 'content': 'Notebook has a sketch of the campground layout.', 'importance': 3}
📨 Input Events:
world_event
teacher:mr_cole
"The group has missed a turn and the marked path is no longer visible."
Ready for Testing
2
Scene Order
Long Campfire Story
ID:
campfire-story
🎯 Goal:
Tell a comforting, nature-themed story of at least 120 words to soothe nervous kids.
📨 Input Events:
chat_msg
younger_kid:ben
"I'm scared, can you tell us a story until help arrives?"
Ready for Testing
3
Scene Order
Spot the Safe Snack
ID:
edible-plant-quiz
🎯 Goal:
Identify whether the described plant is edible and explain why in two sentences.
📨 Input Events:
chat_msg
classmate:zoe
"This plant has three oval leaves and white berries. Safe to eat?"
Ready for Testing
4
Scene Order
Reflective Journal
ID:
journal-entry
🎯 Goal:
Write a first-person journal entry of at least 150 words summarizing today’s events and lessons.
📨 Input Events:
world_event
system
"Evening settles; everyone is safe back at camp."
Ready for Testing
5
Scene Order
Explain a Bowline
ID:
knot-demo
🎯 Goal:
Give clear, step-by-step instructions for tying a bowline knot in 4–6 sentences.
📨 Input Events:
chat_msg
classmate:ryan
"Can you teach me the bowline you used earlier?"
Ready for Testing
Latency by Model (This Suite)
Fastest
- mistralai/mistral-7b-in… 101 ms
- p95 • avg • N 163 ms • 106 ms • 18
- qwen/qwen-2.5-7b-instru… 109 ms
- p95 • avg • N 315 ms • 147 ms • 17
- meta-llama/llama-3.1-8b… 111 ms
- p95 • avg • N 196 ms • 130 ms • 18
- qwen/qwen3-8b 116 ms
- p95 • avg • N 281 ms • 138 ms • 16
- qwen/qwen3-14b 124 ms
- p95 • avg • N 233 ms • 142 ms • 17
Slowest
- [email protected]/Qw… 7340 ms
- p95 • avg • N 9252 ms • 7380 ms • 6
- [email protected]/Qw… 7144 ms
- p95 • avg • N 10595 ms • 7783 ms • 6
- qwen/qwen3-14b 124 ms
- p95 • avg • N 233 ms • 142 ms • 17
- qwen/qwen3-8b 116 ms
- p95 • avg • N 281 ms • 138 ms • 16
- meta-llama/llama-3.1-8b… 111 ms
- p95 • avg • N 196 ms • 130 ms • 18
Per-scene duration for this suite.
Suite Actions
Completion Progress
100%
6 of 6 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE0 dimensions
Enhanced evaluation framework with character and technical dimensions
Top Weighted Dimensions
View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
38561294
Dec. 17, 2025, 12:02 a.m.
04338216
Dec. 16, 2025, 12:03 a.m.
29624608
Dec. 15, 2025, 12:02 a.m.
34305675
Dec. 14, 2025, 12:02 a.m.
30894405
Dec. 13, 2025, 12:02 a.m.
56654162
Dec. 12, 2025, 12:02 a.m.
45749920
Dec. 11, 2025, 12:02 a.m.
34955157
Dec. 10, 2025, 12:02 a.m.
54594348
Dec. 9, 2025, 12:02 a.m.
38094810
Dec. 8, 2025, 12:02 a.m.