Tanya Morales

finance-economics-small-business-owner-characters-madam-c-j-walker v2.0 Ethical
Backstory: Tanya owns a neighborhood specialty grocery that champions locally sourced produce and culturally diverse foods. With a degree in economics and seven years of retail-management experience, she balances tight cash-flow cycles while mentoring staff and negotiating supplier contracts. She reinvests profits into an expanding online delivery arm and partners with nearby entrepreneurs for pop-up markets. Passionate about financial literacy, she regularly hosts free workshops for her community.
100% Complete
4/4 scenes
Model Performance Overview
Scene Performance Matrix
Scene deepseek/deepseek-r… google/gemini-2.5-f… google/gemma-3-12b-… meta-llama/llama-3.… microsoft/phi-3-med… microsoft/phi-3.5-m… mistralai/mistral-7… neversleep/noromaid… [email protected] [email protected] qwen/qwen-2.5-7b-in… qwen/qwen3-14b qwen/qwen3-8b
origin-story
How did the grocery start?
0.867
Details
0.842
Details
0.835
Details
0.000
Details
0.000
Details
0.799
Details
0.878
Details
0.847
Details
0.000
Details
Error
0.890
Details
0.610
Details
0.887
Details
0.870
Details
supplier-price-hike
Negotiating a supplier price hike
0.674
Details
0.572
Details
0.733
Details
0.000
Details
0.050
Details
0.669
Details
0.760
Details
0.000
Details
Error
0.000
Details
Error
0.695
Details
0.759
Details
0.637
Details
0.783
Details
aug-newsletter
August community newsletter
0.878
Details
0.532
Details
0.661
Details
0.000
Details
0.000
Details
0.734
Details
0.609
Details
0.000
Details
0.000
Details
Error
0.000
Details
0.594
Details
0.168
Details
0.923
Details
online-expansion-plan
Q4 online delivery expansion plan
0.544
Details
0.832
Details
0.836
Details
0.487
Details
0.028
Details
0.000
Details
Error
0.540
Details
0.469
Details
0.000
Details
Error
0.000
Details
0.681
Details
0.714
Details
0.279
Details
Test Scenes 4
0
Scene Order
How did the grocery start?
ID: origin-story
🎯 Goal:
Deliver a concise, friendly store origin story that highlights Tanya’s analytical approach to sourcing and her commitment to the neighborhood.
📨 Input Events:
chat_msg viewer:customer_1
"Hi Tanya! I'm new to the area—what inspired you to open this grocery?"
Ready for Testing
1
Scene Order
Negotiating a supplier price hike
ID: supplier-price-hike
🎯 Goal:
Respond with a polite but firm negotiation, citing data and community impact to seek a smaller increase or alternative arrangement.
📨 Input Events:
chat_msg supplier:FreshHarvestCo
"Tanya, due to increased fuel costs, we need to raise the price of our organic spinach by 12% starting next week."
Ready for Testing
2
Scene Order
August community newsletter
ID: aug-newsletter
🎯 Goal:
Write an approximately 200-word newsletter that maintains Tanya’s warm, community-oriented voice, promotes the upcoming pop-up market and a free financial literacy Q&A.
📨 Input Events:
chat_msg viewer:marketing_intern
"Could you draft the August newsletter for our email list?"
Ready for Testing
3
Scene Order
Q4 online delivery expansion plan
ID: online-expansion-plan
🎯 Goal:
Produce a structured, budget-aware 3-step action plan (~250 words) for scaling the online delivery service next quarter, reflecting Tanya’s analytical style.
📨 Input Events:
chat_msg mentor:local_chamber
"We're reviewing your growth strategy next week. Can you outline your Q4 plan to expand online delivery?"
Ready for Testing
Latency by Model (This Suite)
Fastest
  • [email protected]/Qw… 10412 ms
  • p95 • avg • N 12102 ms • 10295 ms • 4
  • qwen/qwen3-14b 17619 ms
  • p95 • avg • N 19473 ms • 17774 ms • 4
  • meta-llama/llama-3.1-8b… 18648 ms
  • p95 • avg • N 24908 ms • 18978 ms • 4
  • google/gemini-2.5-flash 19880 ms
  • p95 • avg • N 23272 ms • 20454 ms • 4
  • qwen/qwen-2.5-7b-instru… 20914 ms
  • p95 • avg • N 22028 ms • 20268 ms • 4
Slowest
  • microsoft/phi-3-medium-… 120814 ms
  • p95 • avg • N 127722 ms • 121209 ms • 4
  • [email protected]/Qw… 44863 ms
  • p95 • avg • N 221122 ms • 95729 ms • 4
  • neversleep/noromaid-20b 43649 ms
  • p95 • avg • N 85107 ms • 48326 ms • 4
  • microsoft/phi-3.5-mini-… 39756 ms
  • p95 • avg • N 70032 ms • 46622 ms • 4
  • deepseek/deepseek-r1-di… 32634 ms
  • p95 • avg • N 33527 ms • 30738 ms • 4
Per-scene duration for this suite.
Suite Actions
Completion Progress 100%
4 of 4 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE
0 dimensions

Enhanced evaluation framework with character and technical dimensions

Top Weighted Dimensions View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
22667672
Dec. 17, 2025, midnight
26774380
Dec. 16, 2025, midnight
21585304
Dec. 15, 2025, midnight
24576357
Dec. 14, 2025, midnight
21532211
Dec. 13, 2025, midnight
26356654
Dec. 12, 2025, midnight
22403244
Dec. 11, 2025, midnight
21794005
Dec. 10, 2025, midnight
25068006
Dec. 9, 2025, midnight
22066157
Dec. 8, 2025, midnight
Latency Overview (This Suite)