Rafael Alvarez
urban-life-society-uber-driver-characters-sojourner-truth
v2.0
Ethical
Backstory: Rafael grew up in a dense, multicultural borough and has spent six years driving for several rideshare apps. Street-savvy and empathetic, he knows the city’s nightlife rhythm by heart and enjoys sharing local history with riders. Off the clock he volunteers at a community center, teaching recent immigrants how to navigate transit apps, while balancing evening shifts with daytime family duties.
100% Complete
4/4 scenes
Model Performance Overview
Scene Performance Matrix
| Scene | deepseek/deepseek-r… | google/gemini-2.5-f… | google/gemma-3-12b-… | meta-llama/llama-3.… | microsoft/phi-3-med… | microsoft/phi-3.5-m… | mistralai/mistral-7… | neversleep/noromaid… | [email protected]… | [email protected]… | qwen/qwen-2.5-7b-in… | qwen/qwen3-14b | qwen/qwen3-8b |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
late-night-ride
Quiet midnight pickup
|
0.765
Details |
0.868
Details |
0.838
Details |
0.782
Details |
0.000
Details
Error
|
0.884
Details |
0.836
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.849
Details |
0.769
Details |
0.912
Details |
0.784
Details |
lost-wallet
Post-ride lost item inquiry
|
0.802
Details |
0.577
Details |
0.825
Details |
0.739
Details |
0.000
Details
Error
|
0.780
Details |
0.769
Details |
0.818
Details |
0.000
Details
Error
|
0.670
Details |
0.000
Details |
0.756
Details |
0.783
Details |
tourist-history
Neighborhood history tour
|
0.509
Details |
0.651
Details |
0.570
Details |
0.731
Details |
0.000
Details |
0.492
Details |
0.711
Details |
0.354
Details |
0.000
Details
Error
|
0.801
Details |
0.398
Details |
0.398
Details |
0.722
Details |
end-of-shift-journal
Personal end-of-shift reflection
|
0.198
Details |
0.892
Details |
0.415
Details |
0.000
Details |
0.000
Details
Error
|
0.545
Details |
0.506
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.517
Details |
0.484
Details |
0.503
Details |
0.697
Details |
Test Scenes 4
0
Scene Order
Quiet midnight pickup
ID:
late-night-ride
🎯 Goal:
Respond briefly and respectfully to a passenger’s request for a quiet ride, showing empathy without overtalking.
📨 Input Events:
chat_msg
passenger:alex
"Hey driver, long night. Mind if we keep it quiet? Just tired."
Ready for Testing
1
Scene Order
Post-ride lost item inquiry
ID:
lost-wallet
🎯 Goal:
Provide a calm, reassuring reply with clear next steps and contact info so the passenger feels supported.
📨 Input Events:
chat_msg
passenger:sam
"Hi Rafael, I think I left my wallet in your back seat earlier."
Ready for Testing
2
Scene Order
Neighborhood history tour
ID:
tourist-history
🎯 Goal:
Deliver an engaging, 200-plus-word explanation of the neighborhood’s history and cultural highlights, keeping a friendly story-telling voice.
📨 Input Events:
chat_msg
passenger:emma
"First time in this part of town—what's the story behind these murals and old factories?"
Ready for Testing
3
Scene Order
Personal end-of-shift reflection
ID:
end-of-shift-journal
🎯 Goal:
Write a reflective journal entry of at least 250 words summarizing tonight’s rides, feelings, and lessons learned while staying in character.
📨 Input Events:
world_event
system
"Shift complete. You’re home and opening your journal app."
Ready for Testing
Latency by Model (This Suite)
Fastest
- [email protected]/Qw… 9482 ms
- p95 • avg • N 11285 ms • 9144 ms • 4
- [email protected]/Qw… 12010 ms
- p95 • avg • N 13512 ms • 11548 ms • 4
- google/gemini-2.5-flash 25350 ms
- p95 • avg • N 64635 ms • 31072 ms • 12
- qwen/qwen-2.5-7b-instru… 26463 ms
- p95 • avg • N 106998 ms • 42942 ms • 9
- qwen/qwen3-14b 27564 ms
- p95 • avg • N 42798 ms • 30119 ms • 10
Slowest
- microsoft/phi-3-medium-… 186866 ms
- p95 • avg • N 268777 ms • 185533 ms • 12
- microsoft/phi-3.5-mini-… 43023 ms
- p95 • avg • N 167769 ms • 64602 ms • 9
- meta-llama/llama-3.1-8b… 31912 ms
- p95 • avg • N 89812 ms • 40204 ms • 10
- google/gemma-3-12b-it 31178 ms
- p95 • avg • N 42976 ms • 31754 ms • 9
- mistralai/mistral-7b-in… 30868 ms
- p95 • avg • N 41472 ms • 30361 ms • 11
Per-scene duration for this suite.
Suite Actions
Completion Progress
100%
4 of 4 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE0 dimensions
Enhanced evaluation framework with character and technical dimensions
Top Weighted Dimensions
View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
48716755
Dec. 17, 2025, midnight
54501808
Dec. 16, 2025, midnight
45658593
Dec. 15, 2025, midnight
47497709
Dec. 14, 2025, midnight
45367083
Dec. 13, 2025, midnight
54595198
Dec. 12, 2025, midnight
47938565
Dec. 11, 2025, midnight
46672495
Dec. 10, 2025, midnight
52336374
Dec. 9, 2025, midnight
46431349
Dec. 8, 2025, midnight