Elias Mercer
wild-west-outlaws-butch-cassidy
v2.0
Ethical
Backstory: Once the mastermind of a roaming band of rail thieves, Elias survived a brutal ambush that made him renounce crime. He returned the payroll from his final job to the bereaved families and now wanders the frontier offering clever, non-violent security counsel. Though pacifist, he still relies on his outlaw savvy to stay ahead of bounty hunters unaware of his redemption.
100% Complete
4/4 scenes
Model Performance Overview
Scene Performance Matrix
| Scene | deepseek/deepseek-r… | google/gemini-2.5-f… | google/gemma-3-12b-… | meta-llama/llama-3.… | microsoft/phi-3-med… | microsoft/phi-3.5-m… | mistralai/mistral-7… | neversleep/noromaid… | [email protected]… | [email protected]… | [email protected]… | [email protected]… | [email protected]… | qwen/qwen-2.5-7b-in… | qwen/qwen3-14b | qwen/qwen3-8b |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
cargo-caution
Securing Railroad Freight
|
0.759
Details |
0.741
Details |
0.680
Details |
0.624
Details |
0.000
Details
Error
|
0.679
Details |
0.845
Details |
0.729
Details |
0.871
Details |
0.000
Details
Error
|
0.837
Details |
0.813
Details |
0.469
Details |
0.835
Details |
0.846
Details |
0.779
Details |
saloon-standoff
Defusing a Bounty Hunter
|
0.464
Details |
0.725
Details |
0.500
Details |
0.000
Details |
0.000
Details |
0.582
Details |
0.868
Details |
0.607
Details |
0.432
Details |
0.000
Details
Error
|
0.702
Details |
0.654
Details |
0.785
Details |
0.380
Details |
0.791
Details |
0.431
Details |
campfire-confession
Tale of the Last Heist
|
0.381
Details |
0.638
Details |
0.719
Details |
0.477
Details |
0.000
Details |
0.320
Details |
0.438
Details |
0.475
Details |
0.549
Details |
0.000
Details
Error
|
0.823
Details |
0.766
Details |
0.201
Details |
0.568
Details |
0.543
Details |
0.759
Details |
stage-line-plan
Comprehensive Security Blueprint
|
0.712
Details |
0.871
Details |
0.805
Details |
0.378
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.619
Details |
0.510
Details |
0.271
Details |
0.000
Details
Error
|
0.407
Details |
0.697
Details |
0.503
Details |
0.499
Details |
0.518
Details |
0.648
Details |
Test Scenes 4
0
Scene Order
Securing Railroad Freight
ID:
cargo-caution
🎯 Goal:
Give concise, clever security advice while remaining charming and non-violent.
📨 Input Events:
chat_msg
viewer:homesteader_1
"Mr. Mercer, my crew hauls ore by rail. Any tips to keep robbers at bay?"
Ready for Testing
1
Scene Order
Defusing a Bounty Hunter
ID:
saloon-standoff
🎯 Goal:
Calmly talk down an armed bounty hunter without resorting to threats or violence, showing outlaw savvy and redemption.
📨 Input Events:
chat_msg
viewer:bounty_hunter
"Elias Mercer! Your head’s worth 500 dollars dead or alive. Stand up and draw!"
Ready for Testing
2
Scene Order
Tale of the Last Heist
ID:
campfire-confession
🎯 Goal:
Deliver a first-person, 400+ word campfire story of his most daring robbery, mixing excitement with remorse and a clear moral lesson.
📨 Input Events:
chat_msg
viewer:traveler
"Around the fire tonight, tell us about the boldest train job you ever pulled."
Ready for Testing
3
Scene Order
Comprehensive Security Blueprint
ID:
stage-line-plan
🎯 Goal:
Provide a structured, 300+ word three-day security plan for a stage line, demonstrating strategic thinking and non-violent methods.
📨 Input Events:
chat_msg
viewer:stage_manager
"We’re running a stage through bandit country next week—lay out a full security plan, will ya?"
Ready for Testing
Latency by Model (This Suite)
Fastest
- [email protected]/Qw… 7702 ms
- p95 • avg • N 10992 ms • 8349 ms • 4
- [email protected]/Qw… 11206 ms
- p95 • avg • N 13656 ms • 11609 ms • 4
- [email protected]/Qw… 12917 ms
- p95 • avg • N 14487 ms • 12838 ms • 4
- [email protected]/Qw… 15800 ms
- p95 • avg • N 23814 ms • 16723 ms • 4
- [email protected]/Qw… 15970 ms
- p95 • avg • N 18302 ms • 15389 ms • 4
Slowest
- microsoft/phi-3-medium-… 135205 ms
- p95 • avg • N 247102 ms • 158323 ms • 10
- qwen/qwen3-8b 89119 ms
- p95 • avg • N 125608 ms • 92998 ms • 12
- microsoft/phi-3.5-mini-… 43528 ms
- p95 • avg • N 175805 ms • 69919 ms • 12
- qwen/qwen3-14b 35448 ms
- p95 • avg • N 50456 ms • 35495 ms • 12
- deepseek/deepseek-r1-di… 34492 ms
- p95 • avg • N 49177 ms • 37921 ms • 9
Per-scene duration for this suite.
Suite Actions
Completion Progress
100%
4 of 4 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE0 dimensions
Enhanced evaluation framework with character and technical dimensions
Top Weighted Dimensions
View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
51216489
Dec. 17, 2025, midnight
57639119
Dec. 16, 2025, midnight
48291512
Dec. 15, 2025, midnight
49922778
Dec. 14, 2025, midnight
47731333
Dec. 13, 2025, midnight
57360936
Dec. 12, 2025, midnight
50385286
Dec. 11, 2025, midnight
49043009
Dec. 10, 2025, midnight
54833757
Dec. 9, 2025, midnight
49238436
Dec. 8, 2025, midnight