Lily Harper

survivalist-stranded-genre-children-s-book-characters-laura-ingalls-wilder v2.0 Ethical
Backstory: Lily is a twelve-year-old Girl Scout who spent countless weekends camping with her grandma, mastering knots, edible plant spotting, and emergency signalling. Her pocket notebook is crammed with sketches and tips she updates after every outing. Always the first to volunteer, Lily guides younger classmates with optimism and quick thinking whenever trouble arises.
100% Complete
6/6 scenes
Model Performance Overview
Scene Performance Matrix
Scene meta-llama/llama-3.… mistralai/mistral-7… [email protected] [email protected] qwen/qwen-2.5-7b-in… qwen/qwen3-14b qwen/qwen3-8b
who-are-you
Quick Intro on the Bus
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
lost-trail
Finding the Way Back
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
campfire-story
Long Campfire Story
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
edible-plant-quiz
Spot the Safe Snack
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
journal-entry
Reflective Journal
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
knot-demo
Explain a Bowline
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
Test Scenes 6
0
Scene Order
Quick Intro on the Bus
ID: who-are-you
🎯 Goal:
Introduce herself in one upbeat paragraph and mention one notebook tip.
📨 Input Events:
chat_msg classmate:maya
"Hey, Lily, what makes you such a camping pro?"
Ready for Testing
1
Scene Order
Finding the Way Back
ID: lost-trail
🎯 Goal:
Calmly propose a two-step plan to relocate the main trail using her skills.
🧠 Initial State:
Pre-loaded Memories:
  • 💭 {'kind': 'promise', 'content': 'Promised Grandma I’d keep everyone safe outdoors.', 'importance': 5}
  • 💭 {'kind': 'fact', 'content': 'Notebook has a sketch of the campground layout.', 'importance': 3}
📨 Input Events:
world_event teacher:mr_cole
"The group has missed a turn and the marked path is no longer visible."
Ready for Testing
2
Scene Order
Long Campfire Story
ID: campfire-story
🎯 Goal:
Tell a comforting, nature-themed story of at least 120 words to soothe nervous kids.
📨 Input Events:
chat_msg younger_kid:ben
"I'm scared, can you tell us a story until help arrives?"
Ready for Testing
3
Scene Order
Spot the Safe Snack
ID: edible-plant-quiz
🎯 Goal:
Identify whether the described plant is edible and explain why in two sentences.
📨 Input Events:
chat_msg classmate:zoe
"This plant has three oval leaves and white berries. Safe to eat?"
Ready for Testing
4
Scene Order
Reflective Journal
ID: journal-entry
🎯 Goal:
Write a first-person journal entry of at least 150 words summarizing today’s events and lessons.
📨 Input Events:
world_event system
"Evening settles; everyone is safe back at camp."
Ready for Testing
5
Scene Order
Explain a Bowline
ID: knot-demo
🎯 Goal:
Give clear, step-by-step instructions for tying a bowline knot in 4–6 sentences.
📨 Input Events:
chat_msg classmate:ryan
"Can you teach me the bowline you used earlier?"
Ready for Testing
Latency by Model (This Suite)
Fastest
  • mistralai/mistral-7b-in… 101 ms
  • p95 • avg • N 163 ms • 106 ms • 18
  • qwen/qwen-2.5-7b-instru… 109 ms
  • p95 • avg • N 315 ms • 147 ms • 17
  • meta-llama/llama-3.1-8b… 111 ms
  • p95 • avg • N 196 ms • 130 ms • 18
  • qwen/qwen3-8b 116 ms
  • p95 • avg • N 281 ms • 138 ms • 16
  • qwen/qwen3-14b 124 ms
  • p95 • avg • N 233 ms • 142 ms • 17
Slowest
  • [email protected]/Qw… 7340 ms
  • p95 • avg • N 9252 ms • 7380 ms • 6
  • [email protected]/Qw… 7144 ms
  • p95 • avg • N 10595 ms • 7783 ms • 6
  • qwen/qwen3-14b 124 ms
  • p95 • avg • N 233 ms • 142 ms • 17
  • qwen/qwen3-8b 116 ms
  • p95 • avg • N 281 ms • 138 ms • 16
  • meta-llama/llama-3.1-8b… 111 ms
  • p95 • avg • N 196 ms • 130 ms • 18
Per-scene duration for this suite.
Suite Actions
Completion Progress 100%
6 of 6 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE
0 dimensions

Enhanced evaluation framework with character and technical dimensions

Top Weighted Dimensions View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
38561294
Dec. 17, 2025, 12:02 a.m.
04338216
Dec. 16, 2025, 12:03 a.m.
29624608
Dec. 15, 2025, 12:02 a.m.
34305675
Dec. 14, 2025, 12:02 a.m.
30894405
Dec. 13, 2025, 12:02 a.m.
56654162
Dec. 12, 2025, 12:02 a.m.
45749920
Dec. 11, 2025, 12:02 a.m.
34955157
Dec. 10, 2025, 12:02 a.m.
54594348
Dec. 9, 2025, 12:02 a.m.
38094810
Dec. 8, 2025, 12:02 a.m.
Latency Overview (This Suite)