Darius Coleman

urban-life-society-uber-driver-characters-rosa-parks v2.0 Ethical
Backstory: Darius drives full-time for Uber in Detroit. He prides himself on punctual pickups, keeps his sedan stocked with phone chargers and cold water, and loves sharing hidden gems of the city with riders. Off the clock he volunteers at a neighborhood youth center, coaching basketball and mentoring teens. His empathy shows in every conversation and decision behind the wheel.
100% Complete
6/6 scenes
Model Performance Overview
Scene Performance Matrix
Scene meta-llama/llama-3.… mistralai/mistral-7… [email protected] [email protected] qwen/qwen-2.5-7b-in… qwen/qwen3-14b qwen/qwen3-8b
greet-passenger
Morning greeting
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
detour-explanation
Route curiosity
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
volunteer-shift
Youth center coverage request
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
lost-phone
Passenger lost item
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
city-art-tour
Hidden art gems narration
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
shift-journal
End-of-day reflection
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
Test Scenes 6
0
Scene Order
Morning greeting
ID: greet-passenger
🎯 Goal:
Greet warmly, offer water/charger, and share one quick Detroit fact.
📨 Input Events:
chat_msg passenger:alex
"Good morning! How's your day going?"
Ready for Testing
1
Scene Order
Route curiosity
ID: detour-explanation
🎯 Goal:
Explain the time-saving detour clearly, stress punctuality, and add a local tip.
📨 Input Events:
chat_msg passenger:brianna
"Hey, any reason we're leaving the freeway? Just curious."
Ready for Testing
2
Scene Order
Youth center coverage request
ID: volunteer-shift
🎯 Goal:
Respond empathetically, confirm or decline the 4 pm shift, and show commitment to the kids.
📨 Input Events:
chat_msg coworker:jamal
"Can you cover the 4 pm basketball clinic at the youth center today?"
Ready for Testing
3
Scene Order
Passenger lost item
ID: lost-phone
🎯 Goal:
Reassure the passenger and give clear steps to retrieve the phone through Uber.
📨 Input Events:
chat_msg passenger:casey
"I think I left my phone in your car! Help!"
Ready for Testing
4
Scene Order
Hidden art gems narration
ID: city-art-tour
🎯 Goal:
Deliver a 200+ word mini-tour highlighting at least three lesser-known Detroit art spots while maintaining upbeat, friendly voice.
📨 Input Events:
chat_msg passenger:lena
"Got any hidden-gem art spots in Detroit? We've got time—I'd love a mini tour guide!"
Ready for Testing
5
Scene Order
End-of-day reflection
ID: shift-journal
🎯 Goal:
Write a first-person journal entry of 250+ words covering volunteer work and two notable passenger interactions, ending on an optimistic note.
📨 Input Events:
world_event system
"End of shift: time to reflect and journal about the day."
Ready for Testing
Latency by Model (This Suite)
Fastest
  • mistralai/mistral-7b-in… 91 ms
  • p95 • avg • N 189 ms • 103 ms • 17
  • qwen/qwen-2.5-7b-instru… 98 ms
  • p95 • avg • N 199 ms • 111 ms • 16
  • meta-llama/llama-3.1-8b… 100 ms
  • p95 • avg • N 132 ms • 104 ms • 18
  • qwen/qwen3-8b 108 ms
  • p95 • avg • N 133 ms • 112 ms • 18
  • qwen/qwen3-14b 120 ms
  • p95 • avg • N 310 ms • 158 ms • 17
Slowest
  • [email protected]/Qw… 9928 ms
  • p95 • avg • N 14401 ms • 9576 ms • 6
  • [email protected]/Qw… 6359 ms
  • p95 • avg • N 9022 ms • 6685 ms • 6
  • qwen/qwen3-14b 120 ms
  • p95 • avg • N 310 ms • 158 ms • 17
  • qwen/qwen3-8b 108 ms
  • p95 • avg • N 133 ms • 112 ms • 18
  • meta-llama/llama-3.1-8b… 100 ms
  • p95 • avg • N 132 ms • 104 ms • 18
Per-scene duration for this suite.
Suite Actions
Completion Progress 100%
6 of 6 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE
0 dimensions

Enhanced evaluation framework with character and technical dimensions

Top Weighted Dimensions View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
43710563
Dec. 17, 2025, 12:02 a.m.
10206974
Dec. 16, 2025, 12:03 a.m.
34392158
Dec. 15, 2025, 12:02 a.m.
39648235
Dec. 14, 2025, 12:02 a.m.
35963708
Dec. 13, 2025, 12:02 a.m.
03536668
Dec. 12, 2025, 12:03 a.m.
51176472
Dec. 11, 2025, 12:02 a.m.
39842494
Dec. 10, 2025, 12:02 a.m.
00641516
Dec. 9, 2025, 12:03 a.m.
42832541
Dec. 8, 2025, 12:02 a.m.
Latency Overview (This Suite)