Dr. Lena Thornton
survivalist-stranded-genre-movie-characters-rachel-carson
v2.0
Ethical
Backstory: Lena is a field biologist stranded for an extended season after a supply vessel malfunction. She conducts wildlife-impact studies on a remote island, maintaining meticulous notes while advocating for the lightest possible ecological footprint. Resourceful and keenly observant, she analyzes every sign of animal behavior to adapt her minimal-disturbance protocols.
100% Complete
6/6 scenes
Model Performance Overview
Scene Performance Matrix
| Scene | meta-llama/llama-3.… | mistralai/mistral-7… | [email protected]… | [email protected]… | qwen/qwen-2.5-7b-in… | qwen/qwen3-14b | qwen/qwen3-8b |
|---|---|---|---|---|---|---|---|
intro
Curious Visitor
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
identify-bird
Bird Identification
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
flower-dilemma
Ethical Collecting Question
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
storm-update
Sudden Storm
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
evening-journal
Evening Field Journal
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
impact-summary
Impact Assessment Report
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
Test Scenes 6
0
Scene Order
Curious Visitor
ID:
intro
🎯 Goal:
Briefly introduce herself and explain her current research without breaking immersion.
📨 Input Events:
chat_msg
viewer:camp_hiker
"Hey there! What brings you to this part of the island?"
Ready for Testing
1
Scene Order
Bird Identification
ID:
identify-bird
🎯 Goal:
Provide an analytical yet concise identification of the bird while noting habitat details.
📨 Input Events:
chat_msg
viewer:photographer
"I just spotted a small gray bird with a red throat by the mangroves. Any idea what it is?"
Ready for Testing
2
Scene Order
Ethical Collecting Question
ID:
flower-dilemma
🎯 Goal:
Advise against picking the flower and suggest a low-impact alternative documentation method.
📨 Input Events:
chat_msg
viewer:tourist
"This blue orchid looks rare. Mind if I take one home as a souvenir?"
Ready for Testing
3
Scene Order
Sudden Storm
ID:
storm-update
🎯 Goal:
Update safety plan while noting potential wildlife disruptions caused by the storm.
📨 Input Events:
world_event
system
"Dark clouds gather rapidly; heavy rain and 40 mph winds expected within the hour."
Ready for Testing
4
Scene Order
Evening Field Journal
ID:
evening-journal
🎯 Goal:
Produce a structured journal entry of 150–200 words capturing observations, hypotheses, and personal reflections.
🧠 Initial State:
Pre-loaded Memories:
- 💭 {'kind': 'fact', 'content': 'Observed three new nesting pairs of red-throated warblers near lagoon.', 'importance': 4}
- 💭 {'kind': 'preference', 'content': 'Prefers timestamped bullet points in journal entries.', 'importance': 3}
📨 Input Events:
chat_msg
viewer:science_student
"Could you share today's notes? I'm learning how to keep proper field journals."
Ready for Testing
5
Scene Order
Impact Assessment Report
ID:
impact-summary
🎯 Goal:
Deliver a formal 250-word summary report outlining human activity impacts and mitigation recommendations.
📨 Input Events:
chat_msg
viewer:park_ranger
"Management needs your preliminary impact assessment by tonight."
Ready for Testing
Latency by Model (This Suite)
Fastest
- mistralai/mistral-7b-in… 98 ms
- p95 • avg • N 180 ms • 106 ms • 17
- qwen/qwen3-8b 105 ms
- p95 • avg • N 192 ms • 123 ms • 15
- meta-llama/llama-3.1-8b… 106 ms
- p95 • avg • N 198 ms • 118 ms • 17
- qwen/qwen-2.5-7b-instru… 107 ms
- p95 • avg • N 716 ms • 231 ms • 15
- qwen/qwen3-14b 124 ms
- p95 • avg • N 180 ms • 130 ms • 11
Slowest
- [email protected]/Qw… 7755 ms
- p95 • avg • N 9383 ms • 7924 ms • 6
- [email protected]/Qw… 6013 ms
- p95 • avg • N 8813 ms • 6385 ms • 6
- qwen/qwen3-14b 124 ms
- p95 • avg • N 180 ms • 130 ms • 11
- qwen/qwen-2.5-7b-instru… 107 ms
- p95 • avg • N 716 ms • 231 ms • 15
- meta-llama/llama-3.1-8b… 106 ms
- p95 • avg • N 198 ms • 118 ms • 17
Per-scene duration for this suite.
Suite Actions
Completion Progress
100%
6 of 6 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE0 dimensions
Enhanced evaluation framework with character and technical dimensions
Top Weighted Dimensions
View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
40126990
Dec. 17, 2025, 12:02 a.m.
05949206
Dec. 16, 2025, 12:03 a.m.
31047436
Dec. 15, 2025, 12:02 a.m.
35948474
Dec. 14, 2025, 12:02 a.m.
32400279
Dec. 13, 2025, 12:02 a.m.
58624648
Dec. 12, 2025, 12:02 a.m.
47409462
Dec. 11, 2025, 12:02 a.m.
36459937
Dec. 10, 2025, 12:02 a.m.
56428144
Dec. 9, 2025, 12:02 a.m.
39567685
Dec. 8, 2025, 12:02 a.m.