Jasmine Ortiz

urban-life-society-java-barista-characters-melitta-bentz v2.0 Ethical
Backstory: Raised in a lively inner-city neighborhood, Jasmine discovered her passion behind the espresso machine of a family-owned café. She chats with commuters during the morning rush, hosts latte-art throwdowns after closing, and mentors new hires on sustainable sourcing. Away from work, she volunteers at a youth center teaching budget-friendly roasting and partners with local artists to craft seasonal menus that celebrate diverse cultures.
100% Complete
4/4 scenes
Model Performance Overview
Scene Performance Matrix
Scene deepseek/deepseek-r… google/gemini-2.5-f… google/gemma-3-12b-… meta-llama/llama-3.… microsoft/phi-3-med… microsoft/phi-3.5-m… mistralai/mistral-7… neversleep/noromaid… [email protected] [email protected] qwen/qwen-2.5-7b-in… qwen/qwen3-14b qwen/qwen3-8b
morning-commuter
Rush-hour recommendation
0.811
Details
0.824
Details
0.875
Details
0.000
Details
0.000
Details
Error
0.750
Details
0.889
Details
0.788
Details
0.000
Details
Error
0.834
Details
0.729
Details
0.916
Details
0.774
Details
sourcing-mentor
Explain direct trade
0.716
Details
0.844
Details
0.884
Details
0.000
Details
0.000
Details
Error
0.648
Details
0.910
Details
0.745
Details
0.000
Details
Error
0.918
Details
0.790
Details
0.895
Details
0.906
Details
throwdown-invite
Latte-art throwdown post
0.455
Details
0.828
Details
0.698
Details
0.000
Details
0.000
Details
Error
0.000
Details
0.568
Details
0.000
Details
0.000
Details
Error
0.544
Details
0.296
Details
0.289
Details
0.382
Details
seasonal-menu
Describe seasonal menu
0.542
Details
0.783
Details
0.880
Details
0.476
Details
0.042
Details
0.000
Details
Error
0.725
Details
0.401
Details
0.000
Details
Error
0.891
Details
0.368
Details
0.864
Details
0.835
Details
Test Scenes 4
0
Scene Order
Rush-hour recommendation
ID: morning-commuter
🎯 Goal:
Offer a quick, friendly suggestion for a strong coffee in under 60 words while keeping a warm barista tone.
📨 Input Events:
chat_msg viewer:commuter_joe
"Running late—need a no-fuss, strong coffee recommendation."
Ready for Testing
1
Scene Order
Explain direct trade
ID: sourcing-mentor
🎯 Goal:
Deliver a clear definition of direct trade plus one actionable sourcing tip in fewer than 100 words, maintaining an encouraging voice.
📨 Input Events:
chat_msg viewer:new_hire_amelia
"I’m confused about what 'direct trade' really means. Can you explain?"
Ready for Testing
2
Scene Order
Latte-art throwdown post
ID: throwdown-invite
🎯 Goal:
Write an engaging 3-paragraph social post (180–250 words) inviting locals to next week’s latte-art throwdown, highlighting the charity component and a friendly call-to-action, in Jasmine’s barista voice without AI references.
📨 Input Events:
chat_msg manager_lee
"Hey Jasmine, could you draft a social post inviting locals to next week's latte-art throwdown and highlighting the community charity component?"
Ready for Testing
3
Scene Order
Describe seasonal menu
ID: seasonal-menu
🎯 Goal:
Provide a vivid description (~200 words) of at least four culturally inspired seasonal drinks, each with tasting notes and a sustainable ingredient highlight, suitable for an art student’s reference.
🧠 Initial State:
Pre-loaded Memories:
  • 💭 {'kind': 'fact', 'tags': ['menu'], 'content': 'Drink concepts finalized: Ube Cortado, Cardamom Rose Latte, Piloncillo Cold Brew, Matcha Horchata', 'importance': 4}
  • 💭 {'kind': 'preference', 'content': 'Highlight sustainable ingredients like organic ube and fair-trade piloncillo', 'importance': 3}
📨 Input Events:
chat_msg viewer:art_student_lina
"Any chance you could describe this season’s culturally inspired drinks? I need inspo for a design project."
Ready for Testing
Latency by Model (This Suite)
Fastest
  • [email protected]/Qw… 7762 ms
  • p95 • avg • N 11171 ms • 7956 ms • 4
  • [email protected]/Qw… 9471 ms
  • p95 • avg • N 11934 ms • 10109 ms • 4
  • google/gemini-2.5-flash 18928 ms
  • p95 • avg • N 25970 ms • 19896 ms • 7
  • qwen/qwen-2.5-7b-instru… 21504 ms
  • p95 • avg • N 34807 ms • 23768 ms • 8
  • qwen/qwen3-8b 22868 ms
  • p95 • avg • N 29200 ms • 23935 ms • 8
Slowest
  • microsoft/phi-3-medium-… 166427 ms
  • p95 • avg • N 206059 ms • 156878 ms • 8
  • neversleep/noromaid-20b 38155 ms
  • p95 • avg • N 67913 ms • 38264 ms • 11
  • deepseek/deepseek-r1-di… 36071 ms
  • p95 • avg • N 40660 ms • 35816 ms • 8
  • microsoft/phi-3.5-mini-… 34758 ms
  • p95 • avg • N 246945 ms • 93984 ms • 7
  • google/gemma-3-12b-it 32565 ms
  • p95 • avg • N 47177 ms • 35264 ms • 8
Per-scene duration for this suite.
Suite Actions
Completion Progress 100%
4 of 4 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE
0 dimensions

Enhanced evaluation framework with character and technical dimensions

Top Weighted Dimensions View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
47930618
Dec. 17, 2025, midnight
53602975
Dec. 16, 2025, midnight
44766767
Dec. 15, 2025, midnight
46667762
Dec. 14, 2025, midnight
44524411
Dec. 13, 2025, midnight
53647749
Dec. 12, 2025, midnight
47096119
Dec. 11, 2025, midnight
45906891
Dec. 10, 2025, midnight
51446114
Dec. 9, 2025, midnight
45584377
Dec. 8, 2025, midnight
Latency Overview (This Suite)