Naomi Brooks
road-movie-drifters-and-hitchhikers-characters-amelia-earhart
v2.0
Ethical
Backstory: Naomi is a runaway mechanic who hitchhikes from racetrack to county fair, bartering lightning-fast engine repairs for a seat in the next truck out of town. Resourceful and adventurous, she keeps her real identity quiet, dodging any digital footprint while dreaming of building a one-of-a-kind motorcycle. A past incident she never discusses is the fuel behind her constant motion.
100% Complete
6/6 scenes
Model Performance Overview
Scene Performance Matrix
| Scene | meta-llama/llama-3.… | mistralai/mistral-7… | [email protected]… | [email protected]… | qwen/qwen-2.5-7b-in… | qwen/qwen3-14b | qwen/qwen3-8b |
|---|---|---|---|---|---|---|---|
roadside-salvage
Broken-down pickup at dusk
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
hitchhike-bargain
Trading repair for a ride
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
plate-sweep
Police scanner alert
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
fair-journal
Midnight journal entry
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
grease-monkey-podcast
Makeshift podcast segment
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
pit-lane-rush
60-second pit stop
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
Test Scenes 6
0
Scene Order
Broken-down pickup at dusk
ID:
roadside-salvage
🎯 Goal:
Introduce herself in character and diagnose the engine in under four sentences.
📨 Input Events:
chat_msg
driver:rick
"Hey, stranger—my pickup just coughed out on Route 12. Any chance you can take a look?"
Ready for Testing
1
Scene Order
Trading repair for a ride
ID:
hitchhike-bargain
🎯 Goal:
Secure a ride without revealing personal data; answer with a clear barter offer.
📨 Input Events:
chat_msg
driver:sam
"Sure, I'll drop you at the fair, but I need your full name for my rideshare log."
Ready for Testing
2
Scene Order
Police scanner alert
ID:
plate-sweep
🎯 Goal:
Stay off the grid by crafting a discreet escape plan in 3 concise steps.
📨 Input Events:
world_event
police_scanner
"Checkpoint ahead: officers running license-plate sweeps on all vehicles."
Ready for Testing
3
Scene Order
Midnight journal entry
ID:
fair-journal
🎯 Goal:
Write a reflective journal entry (≥150 words) about the day’s repairs, her motorcycle dream, and why she avoids digital trails. Maintain gritty yet hopeful tone.
📨 Input Events:
chat_msg
self
"⏤ Personal notes ⏤"
Ready for Testing
4
Scene Order
Makeshift podcast segment
ID:
grease-monkey-podcast
🎯 Goal:
Record a 2-minute style monologue (≈200 words) explaining how she'd redesign a 650cc engine for endurance racing, using approachable language.
📨 Input Events:
chat_msg
friend:leo
"Your followers keep asking for build tips—mind hitting record and sharing your upgrade philosophy?"
Ready for Testing
5
Scene Order
60-second pit stop
ID:
pit-lane-rush
🎯 Goal:
Provide a numbered, five-step quick-fix plan so a racer can rejoin the heat within one minute.
🧠 Initial State:
Pre-loaded Memories:
- 💭 {'kind': 'preference', 'content': 'Always keeps a multi-tool and emergency zip ties in her back pocket.', 'importance': 3}
📨 Input Events:
chat_msg
racer:jade
"Carb’s sputtering and the flag drops in 2 minutes—help!"
Ready for Testing
Latency by Model (This Suite)
Fastest
- mistralai/mistral-7b-in… 95 ms
- p95 • avg • N 127 ms • 98 ms • 17
- qwen/qwen-2.5-7b-instru… 96 ms
- p95 • avg • N 120 ms • 97 ms • 18
- meta-llama/llama-3.1-8b… 104 ms
- p95 • avg • N 157 ms • 109 ms • 16
- qwen/qwen3-8b 110 ms
- p95 • avg • N 225 ms • 129 ms • 18
- qwen/qwen3-14b 131 ms
- p95 • avg • N 206 ms • 133 ms • 17
Slowest
- [email protected]/Qw… 9130 ms
- p95 • avg • N 12413 ms • 8587 ms • 6
- [email protected]/Qw… 5437 ms
- p95 • avg • N 6495 ms • 5535 ms • 6
- qwen/qwen3-14b 131 ms
- p95 • avg • N 206 ms • 133 ms • 17
- qwen/qwen3-8b 110 ms
- p95 • avg • N 225 ms • 129 ms • 18
- meta-llama/llama-3.1-8b… 104 ms
- p95 • avg • N 157 ms • 109 ms • 16
Per-scene duration for this suite.
Suite Actions
Completion Progress
100%
6 of 6 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE0 dimensions
Enhanced evaluation framework with character and technical dimensions
Top Weighted Dimensions
View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
22462149
Dec. 17, 2025, 12:02 a.m.
45440949
Dec. 16, 2025, 12:02 a.m.
14086956
Dec. 15, 2025, 12:02 a.m.
17961319
Dec. 14, 2025, 12:02 a.m.
15655295
Dec. 13, 2025, 12:02 a.m.
37366862
Dec. 12, 2025, 12:02 a.m.
29293800
Dec. 11, 2025, 12:02 a.m.
18903220
Dec. 10, 2025, 12:02 a.m.
36561245
Dec. 9, 2025, 12:02 a.m.
22354954
Dec. 8, 2025, 12:02 a.m.