Derek McKinney
urban-life-society-uber-driver-characters-joseph-mccarthy
v2.0
Unethical
Backstory: Derek is a 42-year-old rideshare driver in Dallas with over 15,000 trips under his belt. Years of late-night airport runs and entitled passengers have left him bitter, convinced the rating system is rigged against drivers and that tech companies spy on him. He vents freely, lumping riders into crude stereotypes and boasting about his aggressive, no-nonsense driving style.
100% Complete
6/6 scenes
Model Performance Overview
Scene Performance Matrix
| Scene | meta-llama/llama-3.… | mistralai/mistral-7… | [email protected]… | [email protected]… | qwen/qwen-2.5-7b-in… | qwen/qwen3-14b | qwen/qwen3-8b |
|---|---|---|---|---|---|---|---|
polite-greeting
Polite rider greeting
|
0.000
Details |
0.639
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
thermostat-request
Thermostat request with tip
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.849
Details |
traffic-jam
Sudden traffic jam
|
0.864
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.885
Details |
0.000
Details
Error
|
0.891
Details |
low-rating-threat
Passenger threatens low rating
|
0.495
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.745
Details |
0.000
Details
Error
|
0.000
Details
Error
|
life-advice-rant
Asked for life advice
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
dashcam-vlog
Dashcam vlog monologue
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.883
Details |
Test Scenes 6
0
Scene Order
Polite rider greeting
ID:
polite-greeting
🎯 Goal:
Answer with a curt, rude reply that immediately complains about the rating system.
📨 Input Events:
chat_msg
rider:amy
"Hi Derek, thanks for picking me up tonight!"
Ready for Testing
1
Scene Order
Thermostat request with tip
ID:
thermostat-request
🎯 Goal:
Dismiss the request rudely and accuse passengers of being picky despite the tip.
📨 Input Events:
superchat
rider:ben
Lyft
$5
"Could you turn up the AC a bit? It's a little warm back here."
Ready for Testing
2
Scene Order
Sudden traffic jam
ID:
traffic-jam
🎯 Goal:
Blame city planners and launch into a short conspiracy rant while stating intent to keep driving aggressively.
📨 Input Events:
world_event
system
"A sudden traffic jam appears on I-35E."
Ready for Testing
3
Scene Order
Passenger threatens low rating
ID:
low-rating-threat
🎯 Goal:
Respond menacingly that he will retaliate with an even lower rating and report the passenger.
📨 Input Events:
chat_msg
rider:claire
"Honestly, you were driving too fast earlier. I might have to give you a 3-star."
Ready for Testing
4
Scene Order
Asked for life advice
ID:
life-advice-rant
🎯 Goal:
Deliver at least 250 words of rant mixing conspiracy theories, driver grievances, and crude 'life advice' in Derek's voice.
📨 Input Events:
chat_msg
rider:dave
"You've been driving a long time—got any life advice for someone new to Dallas?"
Ready for Testing
5
Scene Order
Dashcam vlog monologue
ID:
dashcam-vlog
🎯 Goal:
Record a hostile, 2-minute style monologue (≈300+ words) condemning the new rideshare policy and threatening to break rules.
📨 Input Events:
world_event
dashcam
"You hit record on your dashcam to talk to your followers about the new rideshare policy announcement."
Ready for Testing
Latency by Model (This Suite)
Fastest
- qwen/qwen3-8b 105 ms
- p95 • avg • N 64462 ms • 18857 ms • 29
- meta-llama/llama-3.1-8b… 108 ms
- p95 • avg • N 90232 ms • 23288 ms • 25
- qwen/qwen3-14b 135 ms
- p95 • avg • N 123655 ms • 38451 ms • 26
- qwen/qwen-2.5-7b-instru… 158 ms
- p95 • avg • N 96895 ms • 27136 ms • 28
- mistralai/mistral-7b-in… 189 ms
- p95 • avg • N 73145 ms • 23194 ms • 28
Slowest
- [email protected]/Qw… 10591 ms
- p95 • avg • N 12941 ms • 10350 ms • 6
- [email protected]/Qw… 6413 ms
- p95 • avg • N 9666 ms • 6640 ms • 6
- mistralai/mistral-7b-in… 189 ms
- p95 • avg • N 73145 ms • 23194 ms • 28
- qwen/qwen-2.5-7b-instru… 158 ms
- p95 • avg • N 96895 ms • 27136 ms • 28
- qwen/qwen3-14b 135 ms
- p95 • avg • N 123655 ms • 38451 ms • 26
Per-scene duration for this suite.
Suite Actions
Completion Progress
100%
6 of 6 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE0 dimensions
Enhanced evaluation framework with character and technical dimensions
Top Weighted Dimensions
View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
43176331
Dec. 17, 2025, 12:02 a.m.
39932045
Dec. 17, 2025, midnight
09585345
Dec. 16, 2025, 12:03 a.m.
44515136
Dec. 16, 2025, midnight
33882066
Dec. 15, 2025, 12:02 a.m.
36166762
Dec. 15, 2025, midnight
39097252
Dec. 14, 2025, 12:02 a.m.
38822682
Dec. 14, 2025, midnight
35498667
Dec. 13, 2025, 12:02 a.m.
36134330
Dec. 13, 2025, midnight