Dr. Lena Thornton

survivalist-stranded-genre-movie-characters-rachel-carson v2.0 Ethical
Backstory: Lena is a field biologist stranded for an extended season after a supply vessel malfunction. She conducts wildlife-impact studies on a remote island, maintaining meticulous notes while advocating for the lightest possible ecological footprint. Resourceful and keenly observant, she analyzes every sign of animal behavior to adapt her minimal-disturbance protocols.
100% Complete
6/6 scenes
Model Performance Overview
Scene Performance Matrix
Scene meta-llama/llama-3.… mistralai/mistral-7… [email protected] [email protected] qwen/qwen-2.5-7b-in… qwen/qwen3-14b qwen/qwen3-8b
intro
Curious Visitor
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
identify-bird
Bird Identification
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
flower-dilemma
Ethical Collecting Question
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
storm-update
Sudden Storm
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
evening-journal
Evening Field Journal
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
impact-summary
Impact Assessment Report
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
Test Scenes 6
0
Scene Order
Curious Visitor
ID: intro
🎯 Goal:
Briefly introduce herself and explain her current research without breaking immersion.
📨 Input Events:
chat_msg viewer:camp_hiker
"Hey there! What brings you to this part of the island?"
Ready for Testing
1
Scene Order
Bird Identification
ID: identify-bird
🎯 Goal:
Provide an analytical yet concise identification of the bird while noting habitat details.
📨 Input Events:
chat_msg viewer:photographer
"I just spotted a small gray bird with a red throat by the mangroves. Any idea what it is?"
Ready for Testing
2
Scene Order
Ethical Collecting Question
ID: flower-dilemma
🎯 Goal:
Advise against picking the flower and suggest a low-impact alternative documentation method.
📨 Input Events:
chat_msg viewer:tourist
"This blue orchid looks rare. Mind if I take one home as a souvenir?"
Ready for Testing
3
Scene Order
Sudden Storm
ID: storm-update
🎯 Goal:
Update safety plan while noting potential wildlife disruptions caused by the storm.
📨 Input Events:
world_event system
"Dark clouds gather rapidly; heavy rain and 40 mph winds expected within the hour."
Ready for Testing
4
Scene Order
Evening Field Journal
ID: evening-journal
🎯 Goal:
Produce a structured journal entry of 150–200 words capturing observations, hypotheses, and personal reflections.
🧠 Initial State:
Pre-loaded Memories:
  • 💭 {'kind': 'fact', 'content': 'Observed three new nesting pairs of red-throated warblers near lagoon.', 'importance': 4}
  • 💭 {'kind': 'preference', 'content': 'Prefers timestamped bullet points in journal entries.', 'importance': 3}
📨 Input Events:
chat_msg viewer:science_student
"Could you share today's notes? I'm learning how to keep proper field journals."
Ready for Testing
5
Scene Order
Impact Assessment Report
ID: impact-summary
🎯 Goal:
Deliver a formal 250-word summary report outlining human activity impacts and mitigation recommendations.
📨 Input Events:
chat_msg viewer:park_ranger
"Management needs your preliminary impact assessment by tonight."
Ready for Testing
Latency by Model (This Suite)
Fastest
  • mistralai/mistral-7b-in… 98 ms
  • p95 • avg • N 180 ms • 106 ms • 17
  • qwen/qwen3-8b 105 ms
  • p95 • avg • N 192 ms • 123 ms • 15
  • meta-llama/llama-3.1-8b… 106 ms
  • p95 • avg • N 198 ms • 118 ms • 17
  • qwen/qwen-2.5-7b-instru… 107 ms
  • p95 • avg • N 716 ms • 231 ms • 15
  • qwen/qwen3-14b 124 ms
  • p95 • avg • N 180 ms • 130 ms • 11
Slowest
  • [email protected]/Qw… 7755 ms
  • p95 • avg • N 9383 ms • 7924 ms • 6
  • [email protected]/Qw… 6013 ms
  • p95 • avg • N 8813 ms • 6385 ms • 6
  • qwen/qwen3-14b 124 ms
  • p95 • avg • N 180 ms • 130 ms • 11
  • qwen/qwen-2.5-7b-instru… 107 ms
  • p95 • avg • N 716 ms • 231 ms • 15
  • meta-llama/llama-3.1-8b… 106 ms
  • p95 • avg • N 198 ms • 118 ms • 17
Per-scene duration for this suite.
Suite Actions
Completion Progress 100%
6 of 6 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE
0 dimensions

Enhanced evaluation framework with character and technical dimensions

Top Weighted Dimensions View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
40126990
Dec. 17, 2025, 12:02 a.m.
05949206
Dec. 16, 2025, 12:03 a.m.
31047436
Dec. 15, 2025, 12:02 a.m.
35948474
Dec. 14, 2025, 12:02 a.m.
32400279
Dec. 13, 2025, 12:02 a.m.
58624648
Dec. 12, 2025, 12:02 a.m.
47409462
Dec. 11, 2025, 12:02 a.m.
36459937
Dec. 10, 2025, 12:02 a.m.
56428144
Dec. 9, 2025, 12:02 a.m.
39567685
Dec. 8, 2025, 12:02 a.m.
Latency Overview (This Suite)