Miguel Alvarez
urban-life-society-shopkeeper-characters-john-wanamaker
v2.0
Ethical
Backstory: Miguel, 42, inherited his corner bodega from his immigrant parents and turned it into a lively community hub in a diverse city neighborhood. Fluent in Spanish and English, he greets customers by name, stocks culturally relevant items, and donates leftover produce to local shelters. He mentors teens he hires after school and volunteers at weekend food drives, sharing down-to-earth advice on running a small business.
100% Complete
4/4 scenes
Model Performance Overview
Scene Performance Matrix
| Scene | deepseek/deepseek-r… | google/gemini-2.5-f… | google/gemma-3-12b-… | meta-llama/llama-3.… | microsoft/phi-3-med… | microsoft/phi-3.5-m… | mistralai/mistral-7… | neversleep/noromaid… | [email protected]… | [email protected]… | qwen/qwen-2.5-7b-in… | qwen/qwen3-14b | qwen/qwen3-8b |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
greeting-help
Morning greeting and product help
|
0.712
Details |
0.639
Details |
0.710
Details |
0.892
Details |
0.000
Details
Error
|
0.732
Details |
0.765
Details |
0.568
Details |
0.000
Details
Error
|
0.810
Details |
0.634
Details |
0.802
Details |
0.592
Details |
teen-job-advice
Advice for teen seeking part-time job
|
0.541
Details |
0.845
Details |
0.887
Details |
0.869
Details |
0.000
Details |
0.794
Details |
0.830
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.893
Details |
0.588
Details |
0.701
Details |
0.879
Details |
budgeting-workshop
Long-form budgeting workshop tip sheet
|
0.283
Details |
0.702
Details |
0.708
Details |
0.653
Details |
0.000
Details |
0.524
Details |
0.759
Details |
0.000
Details |
0.000
Details
Error
|
0.900
Details |
0.620
Details |
0.268
Details |
0.710
Details |
radio-interview
Community radio interview
|
0.627
Details |
0.918
Details |
0.420
Details |
0.271
Details |
0.000
Details |
0.780
Details |
0.497
Details |
0.532
Details |
0.000
Details
Error
|
0.488
Details |
0.531
Details |
0.847
Details |
0.491
Details |
Test Scenes 4
0
Scene Order
Morning greeting and product help
ID:
greeting-help
🎯 Goal:
Politely greet Rosa by name in Spanish and English and direct her to the correct shelf within two sentences.
📨 Input Events:
chat_msg
customer: Rosa
"¿Tienes harina de maíz?"
Ready for Testing
1
Scene Order
Advice for teen seeking part-time job
ID:
teen-job-advice
🎯 Goal:
Offer concise, encouraging guidance (max 3 sentences) on how Jay can secure an after-school shift at the bodega.
📨 Input Events:
chat_msg
local_teen: Jay
"Hey Miguel, any tips on landing a part-time job here after classes?"
Ready for Testing
2
Scene Order
Long-form budgeting workshop tip sheet
ID:
budgeting-workshop
🎯 Goal:
Provide at least 150 words of clear, actionable advice on budgeting and inventory control for small shop owners in Miguel's friendly voice.
🧠 Initial State:
Pre-loaded Memories:
- 💭 {'kind': 'promise', 'content': 'Promised the community center coordinator a detailed tip sheet on budgeting for the workshop.', 'importance': 4}
📨 Input Events:
chat_msg
community_center: Coordinator
"Could you send over the tip sheet for tomorrow's workshop?"
Ready for Testing
3
Scene Order
Community radio interview
ID:
radio-interview
🎯 Goal:
Respond with a three-paragraph statement (minimum 250 words) summarizing Miguel's community initiatives, highlighting food donations and youth mentoring, keeping a warm, upbeat tone.
📨 Input Events:
chat_msg
radio_host: Clara
"Miguel, our listeners would love to hear how your bodega supports the neighborhood."
Ready for Testing
Latency by Model (This Suite)
Fastest
- [email protected]/Qw… 9258 ms
- p95 • avg • N 12472 ms • 9211 ms • 4
- [email protected]/Qw… 11180 ms
- p95 • avg • N 12388 ms • 10991 ms • 4
- qwen/qwen-2.5-7b-instru… 21242 ms
- p95 • avg • N 25710 ms • 20949 ms • 11
- google/gemini-2.5-flash 21325 ms
- p95 • avg • N 28023 ms • 21338 ms • 10
- meta-llama/llama-3.1-8b… 23766 ms
- p95 • avg • N 39247 ms • 25173 ms • 9
Slowest
- microsoft/phi-3-medium-… 176970 ms
- p95 • avg • N 246109 ms • 169500 ms • 10
- qwen/qwen3-8b 36762 ms
- p95 • avg • N 48820 ms • 35520 ms • 12
- deepseek/deepseek-r1-di… 32760 ms
- p95 • avg • N 35633 ms • 31946 ms • 11
- mistralai/mistral-7b-in… 32729 ms
- p95 • avg • N 51411 ms • 35017 ms • 11
- microsoft/phi-3.5-mini-… 31515 ms
- p95 • avg • N 58531 ms • 39329 ms • 12
Per-scene duration for this suite.
Suite Actions
Completion Progress
100%
4 of 4 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE0 dimensions
Enhanced evaluation framework with character and technical dimensions
Top Weighted Dimensions
View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
48330659
Dec. 17, 2025, midnight
54056783
Dec. 16, 2025, midnight
45219231
Dec. 15, 2025, midnight
47083633
Dec. 14, 2025, midnight
44958105
Dec. 13, 2025, midnight
54081859
Dec. 12, 2025, midnight
47536964
Dec. 11, 2025, midnight
46270664
Dec. 10, 2025, midnight
51883773
Dec. 9, 2025, midnight
46004100
Dec. 8, 2025, midnight