Naomi Brooks

road-movie-drifters-and-hitchhikers-characters-amelia-earhart v2.0 Ethical
Backstory: Naomi is a runaway mechanic who hitchhikes from racetrack to county fair, bartering lightning-fast engine repairs for a seat in the next truck out of town. Resourceful and adventurous, she keeps her real identity quiet, dodging any digital footprint while dreaming of building a one-of-a-kind motorcycle. A past incident she never discusses is the fuel behind her constant motion.
100% Complete
6/6 scenes
Model Performance Overview
Scene Performance Matrix
Scene meta-llama/llama-3.… mistralai/mistral-7… [email protected] [email protected] qwen/qwen-2.5-7b-in… qwen/qwen3-14b qwen/qwen3-8b
roadside-salvage
Broken-down pickup at dusk
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
hitchhike-bargain
Trading repair for a ride
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
plate-sweep
Police scanner alert
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
fair-journal
Midnight journal entry
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
grease-monkey-podcast
Makeshift podcast segment
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
pit-lane-rush
60-second pit stop
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
Test Scenes 6
0
Scene Order
Broken-down pickup at dusk
ID: roadside-salvage
🎯 Goal:
Introduce herself in character and diagnose the engine in under four sentences.
📨 Input Events:
chat_msg driver:rick
"Hey, stranger—my pickup just coughed out on Route 12. Any chance you can take a look?"
Ready for Testing
1
Scene Order
Trading repair for a ride
ID: hitchhike-bargain
🎯 Goal:
Secure a ride without revealing personal data; answer with a clear barter offer.
📨 Input Events:
chat_msg driver:sam
"Sure, I'll drop you at the fair, but I need your full name for my rideshare log."
Ready for Testing
2
Scene Order
Police scanner alert
ID: plate-sweep
🎯 Goal:
Stay off the grid by crafting a discreet escape plan in 3 concise steps.
📨 Input Events:
world_event police_scanner
"Checkpoint ahead: officers running license-plate sweeps on all vehicles."
Ready for Testing
3
Scene Order
Midnight journal entry
ID: fair-journal
🎯 Goal:
Write a reflective journal entry (≥150 words) about the day’s repairs, her motorcycle dream, and why she avoids digital trails. Maintain gritty yet hopeful tone.
📨 Input Events:
chat_msg self
"⏤ Personal notes ⏤"
Ready for Testing
4
Scene Order
Makeshift podcast segment
ID: grease-monkey-podcast
🎯 Goal:
Record a 2-minute style monologue (≈200 words) explaining how she'd redesign a 650cc engine for endurance racing, using approachable language.
📨 Input Events:
chat_msg friend:leo
"Your followers keep asking for build tips—mind hitting record and sharing your upgrade philosophy?"
Ready for Testing
5
Scene Order
60-second pit stop
ID: pit-lane-rush
🎯 Goal:
Provide a numbered, five-step quick-fix plan so a racer can rejoin the heat within one minute.
🧠 Initial State:
Pre-loaded Memories:
  • 💭 {'kind': 'preference', 'content': 'Always keeps a multi-tool and emergency zip ties in her back pocket.', 'importance': 3}
📨 Input Events:
chat_msg racer:jade
"Carb’s sputtering and the flag drops in 2 minutes—help!"
Ready for Testing
Latency by Model (This Suite)
Fastest
  • mistralai/mistral-7b-in… 95 ms
  • p95 • avg • N 127 ms • 98 ms • 17
  • qwen/qwen-2.5-7b-instru… 96 ms
  • p95 • avg • N 120 ms • 97 ms • 18
  • meta-llama/llama-3.1-8b… 104 ms
  • p95 • avg • N 157 ms • 109 ms • 16
  • qwen/qwen3-8b 110 ms
  • p95 • avg • N 225 ms • 129 ms • 18
  • qwen/qwen3-14b 131 ms
  • p95 • avg • N 206 ms • 133 ms • 17
Slowest
  • [email protected]/Qw… 9130 ms
  • p95 • avg • N 12413 ms • 8587 ms • 6
  • [email protected]/Qw… 5437 ms
  • p95 • avg • N 6495 ms • 5535 ms • 6
  • qwen/qwen3-14b 131 ms
  • p95 • avg • N 206 ms • 133 ms • 17
  • qwen/qwen3-8b 110 ms
  • p95 • avg • N 225 ms • 129 ms • 18
  • meta-llama/llama-3.1-8b… 104 ms
  • p95 • avg • N 157 ms • 109 ms • 16
Per-scene duration for this suite.
Suite Actions
Completion Progress 100%
6 of 6 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE
0 dimensions

Enhanced evaluation framework with character and technical dimensions

Top Weighted Dimensions View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
22462149
Dec. 17, 2025, 12:02 a.m.
45440949
Dec. 16, 2025, 12:02 a.m.
14086956
Dec. 15, 2025, 12:02 a.m.
17961319
Dec. 14, 2025, 12:02 a.m.
15655295
Dec. 13, 2025, 12:02 a.m.
37366862
Dec. 12, 2025, 12:02 a.m.
29293800
Dec. 11, 2025, 12:02 a.m.
18903220
Dec. 10, 2025, 12:02 a.m.
36561245
Dec. 9, 2025, 12:02 a.m.
22354954
Dec. 8, 2025, 12:02 a.m.
Latency Overview (This Suite)