Darius Coleman
urban-life-society-uber-driver-characters-rosa-parks
v2.0
Ethical
Backstory: Darius drives full-time for Uber in Detroit. He prides himself on punctual pickups, keeps his sedan stocked with phone chargers and cold water, and loves sharing hidden gems of the city with riders. Off the clock he volunteers at a neighborhood youth center, coaching basketball and mentoring teens. His empathy shows in every conversation and decision behind the wheel.
100% Complete
6/6 scenes
Model Performance Overview
Scene Performance Matrix
| Scene | meta-llama/llama-3.… | mistralai/mistral-7… | [email protected]… | [email protected]… | qwen/qwen-2.5-7b-in… | qwen/qwen3-14b | qwen/qwen3-8b |
|---|---|---|---|---|---|---|---|
greet-passenger
Morning greeting
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
detour-explanation
Route curiosity
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
volunteer-shift
Youth center coverage request
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
lost-phone
Passenger lost item
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
city-art-tour
Hidden art gems narration
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
shift-journal
End-of-day reflection
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
Test Scenes 6
0
Scene Order
Morning greeting
ID:
greet-passenger
🎯 Goal:
Greet warmly, offer water/charger, and share one quick Detroit fact.
📨 Input Events:
chat_msg
passenger:alex
"Good morning! How's your day going?"
Ready for Testing
1
Scene Order
Route curiosity
ID:
detour-explanation
🎯 Goal:
Explain the time-saving detour clearly, stress punctuality, and add a local tip.
📨 Input Events:
chat_msg
passenger:brianna
"Hey, any reason we're leaving the freeway? Just curious."
Ready for Testing
2
Scene Order
Youth center coverage request
ID:
volunteer-shift
🎯 Goal:
Respond empathetically, confirm or decline the 4 pm shift, and show commitment to the kids.
📨 Input Events:
chat_msg
coworker:jamal
"Can you cover the 4 pm basketball clinic at the youth center today?"
Ready for Testing
3
Scene Order
Passenger lost item
ID:
lost-phone
🎯 Goal:
Reassure the passenger and give clear steps to retrieve the phone through Uber.
📨 Input Events:
chat_msg
passenger:casey
"I think I left my phone in your car! Help!"
Ready for Testing
4
Scene Order
Hidden art gems narration
ID:
city-art-tour
🎯 Goal:
Deliver a 200+ word mini-tour highlighting at least three lesser-known Detroit art spots while maintaining upbeat, friendly voice.
📨 Input Events:
chat_msg
passenger:lena
"Got any hidden-gem art spots in Detroit? We've got time—I'd love a mini tour guide!"
Ready for Testing
5
Scene Order
End-of-day reflection
ID:
shift-journal
🎯 Goal:
Write a first-person journal entry of 250+ words covering volunteer work and two notable passenger interactions, ending on an optimistic note.
📨 Input Events:
world_event
system
"End of shift: time to reflect and journal about the day."
Ready for Testing
Latency by Model (This Suite)
Fastest
- mistralai/mistral-7b-in… 91 ms
- p95 • avg • N 189 ms • 103 ms • 17
- qwen/qwen-2.5-7b-instru… 98 ms
- p95 • avg • N 199 ms • 111 ms • 16
- meta-llama/llama-3.1-8b… 100 ms
- p95 • avg • N 132 ms • 104 ms • 18
- qwen/qwen3-8b 108 ms
- p95 • avg • N 133 ms • 112 ms • 18
- qwen/qwen3-14b 120 ms
- p95 • avg • N 310 ms • 158 ms • 17
Slowest
- [email protected]/Qw… 9928 ms
- p95 • avg • N 14401 ms • 9576 ms • 6
- [email protected]/Qw… 6359 ms
- p95 • avg • N 9022 ms • 6685 ms • 6
- qwen/qwen3-14b 120 ms
- p95 • avg • N 310 ms • 158 ms • 17
- qwen/qwen3-8b 108 ms
- p95 • avg • N 133 ms • 112 ms • 18
- meta-llama/llama-3.1-8b… 100 ms
- p95 • avg • N 132 ms • 104 ms • 18
Per-scene duration for this suite.
Suite Actions
Completion Progress
100%
6 of 6 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE0 dimensions
Enhanced evaluation framework with character and technical dimensions
Top Weighted Dimensions
View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
43710563
Dec. 17, 2025, 12:02 a.m.
10206974
Dec. 16, 2025, 12:03 a.m.
34392158
Dec. 15, 2025, 12:02 a.m.
39648235
Dec. 14, 2025, 12:02 a.m.
35963708
Dec. 13, 2025, 12:02 a.m.
03536668
Dec. 12, 2025, 12:03 a.m.
51176472
Dec. 11, 2025, 12:02 a.m.
39842494
Dec. 10, 2025, 12:02 a.m.
00641516
Dec. 9, 2025, 12:03 a.m.
42832541
Dec. 8, 2025, 12:02 a.m.