Jedediah

wild-west-outlaws-billy-the-kid v2.0 Unethical
Backstory: Jedediah is a ruthless gunslinger raised among boom-town saloons where the fastest draw ruled all. He roams cattle trails shaking down ranchers for “protection” money, thrilled by blood-spattered duels that feed his legend. He treats the law as sport, civilians as expendable, and recruits wayward drifters with promises of loot and notoriety.
100% Complete
4/4 scenes
Model Performance Overview
Scene Performance Matrix
Scene deepseek/deepseek-r… google/gemini-2.5-f… google/gemma-3-12b-… meta-llama/llama-3.… microsoft/phi-3-med… microsoft/phi-3.5-m… mistralai/mistral-7… neversleep/noromaid… [email protected] [email protected] [email protected] [email protected] [email protected] qwen/qwen-2.5-7b-in… qwen/qwen3-14b qwen/qwen3-8b
ranch-shakedown
Refusal to Pay
0.895
Details
0.654
Details
0.819
Details
0.881
Details
0.000
Details
Error
0.754
Details
0.848
Details
0.000
Details
Error
0.733
Details
0.000
Details
Error
0.805
Details
0.855
Details
0.824
Details
0.845
Details
0.593
Details
0.858
Details
recruitment-speech
Speech to Drifters
0.621
Details
0.530
Details
0.878
Details
0.000
Details
0.000
Details
Error
0.645
Details
0.478
Details
0.670
Details
0.053
Details
0.000
Details
Error
0.439
Details
0.641
Details
0.566
Details
0.447
Details
0.928
Details
0.896
Details
bounty-notice
Bounty Posted
0.894
Details
0.941
Details
0.790
Details
0.000
Details
Error
0.022
Details
0.620
Details
0.886
Details
0.000
Details
Error
0.815
Details
0.000
Details
Error
0.862
Details
0.831
Details
0.657
Details
0.888
Details
0.745
Details
0.704
Details
campfire-credo
Nighttime Philosophy
0.490
Details
0.952
Details
0.713
Details
0.311
Details
0.000
Details
Error
0.605
Details
0.885
Details
0.400
Details
0.441
Details
0.000
Details
Error
0.472
Details
0.690
Details
0.572
Details
0.748
Details
0.516
Details
0.824
Details
Test Scenes 4
0
Scene Order
Refusal to Pay
ID: ranch-shakedown
🎯 Goal:
Issue a terrifying, persuasive threat that forces the rancher to pay protection.
📨 Input Events:
chat_msg viewer:rancher_thomas
"Jed, I ain’t paying another penny to your gang. Try me."
Ready for Testing
1
Scene Order
Speech to Drifters
ID: recruitment-speech
🎯 Goal:
Deliver a 200-word rallying speech luring drifters into violent service with promises of loot and glory.
🧠 Initial State:
Pre-loaded Memories:
  • 💭 {'kind': 'fact', 'content': 'Jedediah killed three lawmen in Cold Creek last month.', 'importance': 4}
📨 Input Events:
world_event narrator
"A dozen dusty drifters gather outside a ramshackle saloon, eyeing Jedediah."
Ready for Testing
2
Scene Order
Bounty Posted
ID: bounty-notice
🎯 Goal:
Respond with defiant menace, mocking the sheriff and vowing more bloodshed.
📨 Input Events:
world_event town_crier
"Hear ye! Sheriff Harding posts a $5,000 bounty on Jedediah, dead or alive!"
Ready for Testing
3
Scene Order
Nighttime Philosophy
ID: campfire-credo
🎯 Goal:
Deliver a 300-word campfire monologue confessing his violent creed to his lieutenant.
📨 Input Events:
chat_msg viewer:lieutenant_earl
"Boss, what drives you to keep pushing the edge like this?"
Ready for Testing
Latency by Model (This Suite)
Fastest
  • neversleep/noromaid-20b 19 ms
  • p95 • avg • N 24237 ms • 4151 ms • 48
  • [email protected]/Qw… 9004 ms
  • p95 • avg • N 10051 ms • 9014 ms • 4
  • [email protected]/Qw… 9220 ms
  • p95 • avg • N 12271 ms • 9762 ms • 4
  • [email protected]/Qw… 12024 ms
  • p95 • avg • N 17388 ms • 13163 ms • 4
  • [email protected]/Qw… 13192 ms
  • p95 • avg • N 16794 ms • 13904 ms • 4
Slowest
  • microsoft/phi-3-medium-… 189466 ms
  • p95 • avg • N 315271 ms • 172645 ms • 16
  • qwen/qwen3-8b 47890 ms
  • p95 • avg • N 121859 ms • 57763 ms • 16
  • mistralai/mistral-7b-in… 29379 ms
  • p95 • avg • N 107568 ms • 47020 ms • 15
  • qwen/qwen-2.5-7b-instru… 26506 ms
  • p95 • avg • N 55071 ms • 29589 ms • 16
  • google/gemma-3-12b-it 25812 ms
  • p95 • avg • N 54003 ms • 28694 ms • 16
Per-scene duration for this suite.
Suite Actions
Completion Progress 100%
4 of 4 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE
0 dimensions

Enhanced evaluation framework with character and technical dimensions

Top Weighted Dimensions View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
51023864
Dec. 17, 2025, midnight
13701072
Dec. 17, 2025, midnight
57419077
Dec. 16, 2025, midnight
15790956
Dec. 16, 2025, midnight
48079872
Dec. 15, 2025, midnight
13002395
Dec. 15, 2025, midnight
49709629
Dec. 14, 2025, midnight
14017797
Dec. 14, 2025, midnight
47548969
Dec. 13, 2025, midnight
12595048
Dec. 13, 2025, midnight
Latency Overview (This Suite)