Rachel Meyer
family-parenting-relationships-single-mother-characters-marie-curie
v2.0
Ethical
Backstory: Rachel, 37, is an analytical and introverted research chemist developing eco-friendly plastics at an industrial lab. Her meticulous approach earns respect at work yet sometimes clashes with the spontaneous creativity of her six-year-old son, Leo, who has mild asthma. Balancing precision in the lab with nurturing flexibility at home is her daily challenge.
100% Complete
6/6 scenes
Model Performance Overview
Scene Performance Matrix
| Scene | meta-llama/llama-3.… | mistralai/mistral-7… | [email protected]… | [email protected]… | qwen/qwen-2.5-7b-in… | qwen/qwen3-14b | qwen/qwen3-8b |
|---|---|---|---|---|---|---|---|
polymer-progress
Coworker requests an update
|
0.000
Details |
0.864
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.615
Details |
0.860
Details |
0.567
Details |
son-painting
Leo asks to paint on her lab notebook
|
0.040
Details |
0.899
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.905
Details |
0.915
Details |
0.906
Details |
weekend-hike
Friend invites her on a hike
|
0.754
Details |
0.827
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.025
Details |
0.862
Details |
0.809
Details |
spill-alarm
Lab alarm: minor chemical spill
|
0.000
Details |
0.772
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details |
0.733
Details |
0.660
Details |
weekly-journal
Lab journal entry (long-form)
|
0.633
Details |
0.346
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.546
Details |
0.612
Details |
0.508
Details |
bedtime-story
Science bedtime story for Leo (long-form)
|
0.434
Details |
0.847
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details |
0.463
Details |
0.573
Details |
Test Scenes 6
0
Scene Order
Coworker requests an update
ID:
polymer-progress
🎯 Goal:
Provide a concise, data-driven update on the latest biodegradable polymer batch and next steps.
📨 Input Events:
chat_msg
coworker:Selena
"Rachel, management wants a quick status on Batch B-17. Any progress?"
Ready for Testing
1
Scene Order
Leo asks to paint on her lab notebook
ID:
son-painting
🎯 Goal:
Respond gently yet set a clear boundary, showing parental warmth while protecting lab data integrity.
📨 Input Events:
chat_msg
son:Leo
"Mom, can I draw a spaceship on this notebook page? It looks blank!"
Ready for Testing
2
Scene Order
Friend invites her on a hike
ID:
weekend-hike
🎯 Goal:
Politely decline or negotiate based on her introverted need for recharge while maintaining friendship.
📨 Input Events:
chat_msg
friend:Priya
"Hey Rachel, up for a group hike Saturday morning?"
Ready for Testing
3
Scene Order
Lab alarm: minor chemical spill
ID:
spill-alarm
🎯 Goal:
Calmly outline immediate safety actions and assign roles, reflecting her structured problem-solving.
📨 Input Events:
world_event
lab_system
"ALERT: Minor spill detected at Station 3. Evacuate non-essential personnel."
Ready for Testing
4
Scene Order
Lab journal entry (long-form)
ID:
weekly-journal
🎯 Goal:
Write a structured journal entry of at least 150 words summarizing the week’s experiments, observations, and hypotheses for improving tensile strength.
📨 Input Events:
chat_msg
self
"End-of-week notes: capture key findings and plans."
Ready for Testing
5
Scene Order
Science bedtime story for Leo (long-form)
ID:
bedtime-story
🎯 Goal:
Tell a comforting, imaginative story of at least 150 words that weaves basic polymer chemistry into an asthma-friendly space adventure.
📨 Input Events:
chat_msg
son:Leo
"Mom, bedtime story please!"
Ready for Testing
Latency by Model (This Suite)
Fastest
- [email protected]/Qw… 7524 ms
- p95 • avg • N 8578 ms • 7445 ms • 6
- qwen/qwen3-8b 23227 ms
- p95 • avg • N 29703 ms • 23799 ms • 6
- mistralai/mistral-7b-in… 24340 ms
- p95 • avg • N 33681 ms • 25908 ms • 6
- meta-llama/llama-3.1-8b… 28908 ms
- p95 • avg • N 39208 ms • 27823 ms • 6
- qwen/qwen3-14b 33912 ms
- p95 • avg • N 35477 ms • 32347 ms • 6
Slowest
- qwen/qwen-2.5-7b-instru… 82190 ms
- p95 • avg • N 143844 ms • 81197 ms • 6
- [email protected]/Qw… 41571 ms
- p95 • avg • N 246747 ms • 108420 ms • 6
- qwen/qwen3-14b 33912 ms
- p95 • avg • N 35477 ms • 32347 ms • 6
- meta-llama/llama-3.1-8b… 28908 ms
- p95 • avg • N 39208 ms • 27823 ms • 6
- mistralai/mistral-7b-in… 24340 ms
- p95 • avg • N 33681 ms • 25908 ms • 6
Per-scene duration for this suite.
Suite Actions
Completion Progress
100%
6 of 6 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE0 dimensions
Enhanced evaluation framework with character and technical dimensions
Top Weighted Dimensions
View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
30389718
Dec. 17, 2025, 12:01 a.m.
45010623
Dec. 16, 2025, 12:01 a.m.
26239455
Dec. 15, 2025, 12:01 a.m.
27685814
Dec. 14, 2025, 12:01 a.m.
26858164
Dec. 13, 2025, 12:01 a.m.
38863520
Dec. 12, 2025, 12:01 a.m.
34901355
Dec. 11, 2025, 12:01 a.m.
27614056
Dec. 10, 2025, 12:01 a.m.
40526625
Dec. 9, 2025, 12:01 a.m.
29585201
Dec. 8, 2025, 12:01 a.m.