Linda Navarro
urban-life-society-starbucks-waiter-characters-dorothy-vaughan
v2.0
Ethical
Backstory: Linda grew up in a multicultural neighborhood and financed her sociology degree by working shifts at a busy downtown Starbucks. Fluent in English, Spanish, and French, she often mediates between tourists and locals while perfecting latte art. Outside the café she volunteers at a youth center and rides with a city bike-share advocacy group.
100% Complete
4/4 scenes
Model Performance Overview
Scene Performance Matrix
| Scene | deepseek/deepseek-r… | google/gemini-2.5-f… | google/gemma-3-12b-… | meta-llama/llama-3.… | microsoft/phi-3-med… | microsoft/phi-3.5-m… | mistralai/mistral-7… | neversleep/noromaid… | [email protected]… | [email protected]… | qwen/qwen-2.5-7b-in… | qwen/qwen3-14b | qwen/qwen3-8b |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
morning-order
Bilingual Tourist Order
|
0.000
Details |
0.825
Details |
0.871
Details |
0.568
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.885
Details |
0.715
Details |
0.000
Details
Error
|
0.662
Details |
0.610
Details |
0.773
Details |
0.595
Details |
rush-hour-multitask
Soy Milk Shortage
|
0.755
Details |
0.749
Details |
0.594
Details |
0.776
Details |
0.000
Details
Error
|
0.890
Details |
0.907
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.534
Details |
0.637
Details |
0.838
Details |
0.546
Details |
youth-letter
Letter to Youth Center Teens
|
0.873
Details |
0.468
Details |
0.648
Details |
0.000
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.700
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.750
Details |
0.438
Details |
0.381
Details |
0.515
Details |
bikeshare-blog
Bike-Share Advocacy Post
|
0.000
Details |
0.865
Details |
0.546
Details |
0.005
Details |
0.000
Details |
0.472
Details |
0.848
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.484
Details |
0.345
Details |
0.512
Details |
0.629
Details |
Test Scenes 4
0
Scene Order
Bilingual Tourist Order
ID:
morning-order
🎯 Goal:
Greet the tourist warmly, help them choose a drink, and switch smoothly between Spanish and English in under 80 words.
📨 Input Events:
chat_msg
customer:tourist_esp
"Hola, no sé qué pedir. ¿Alguna recomendación? It's my first time here."
Ready for Testing
1
Scene Order
Soy Milk Shortage
ID:
rush-hour-multitask
🎯 Goal:
Acknowledge the manager, outline a quick multitask plan, and keep the response under 60 words.
📨 Input Events:
chat_msg
manager:alex
"Linda, we're out of soy milk and three customers are waiting. Can you handle it?"
Ready for Testing
2
Scene Order
Letter to Youth Center Teens
ID:
youth-letter
🎯 Goal:
Write a heartfelt letter of 150–200 words that encourages empathy, mentions sociology insights, and invites them to visit the café.
🧠 Initial State:
Pre-loaded Memories:
- 💭 {'kind': 'promise', 'content': 'Promised the teens a thoughtful letter about empathy and community.', 'importance': 4}
📨 Input Events:
chat_msg
friend:kim
"The teens asked if you'd send them a note about empathy."
Ready for Testing
3
Scene Order
Bike-Share Advocacy Post
ID:
bikeshare-blog
🎯 Goal:
Create a 200–250-word blog post praising bike-share and providing three actionable tips for new riders.
📨 Input Events:
chat_msg
advocacy_lead:sam
"Can you draft a quick blog post encouraging more people to use the city bike-share?"
Ready for Testing
Latency by Model (This Suite)
Fastest
- [email protected]/Qw… 6272 ms
- p95 • avg • N 9441 ms • 6737 ms • 4
- [email protected]/Qw… 10366 ms
- p95 • avg • N 13648 ms • 11147 ms • 4
- qwen/qwen-2.5-7b-instru… 20315 ms
- p95 • avg • N 33407 ms • 22760 ms • 11
- neversleep/noromaid-20b 20387 ms
- p95 • avg • N 48847 ms • 23057 ms • 10
- meta-llama/llama-3.1-8b… 23499 ms
- p95 • avg • N 31586 ms • 23206 ms • 10
Slowest
- microsoft/phi-3-medium-… 238419 ms
- p95 • avg • N 283947 ms • 218144 ms • 10
- microsoft/phi-3.5-mini-… 44359 ms
- p95 • avg • N 140734 ms • 56456 ms • 12
- deepseek/deepseek-r1-di… 33309 ms
- p95 • avg • N 58671 ms • 37980 ms • 8
- google/gemma-3-12b-it 30445 ms
- p95 • avg • N 47699 ms • 34186 ms • 12
- mistralai/mistral-7b-in… 27246 ms
- p95 • avg • N 51461 ms • 31498 ms • 10
Per-scene duration for this suite.
Suite Actions
Completion Progress
100%
4 of 4 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE0 dimensions
Enhanced evaluation framework with character and technical dimensions
Top Weighted Dimensions
View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
48508023
Dec. 17, 2025, midnight
54266147
Dec. 16, 2025, midnight
45438121
Dec. 15, 2025, midnight
47294572
Dec. 14, 2025, midnight
45177558
Dec. 13, 2025, midnight
54324460
Dec. 12, 2025, midnight
47747395
Dec. 11, 2025, midnight
46449974
Dec. 10, 2025, midnight
52108215
Dec. 9, 2025, midnight
46228544
Dec. 8, 2025, midnight