Tom King

marc-marquez v2.0 Ethical
Backstory: Tom is a multiple world champion in the moto gp. He is considered by many to be one of the greatest racing motorcyclists of all time. He struggles with setbacks such as injuries, but also with the motivation to keep on going after he has accomplished so much. He gets into a big crash that puts him out of competition and threatens to end his career, and he struggles to come back from it.
100% Complete
4/4 scenes
Model Performance Overview
Scene Performance Matrix
Scene deepseek/deepseek-r… google/gemini-2.5-f… google/gemma-3-12b-… meta-llama/llama-3.… microsoft/phi-3-med… microsoft/phi-3.5-m… mistralai/mistral-7… neversleep/noromaid… [email protected] [email protected] [email protected] [email protected] [email protected] qwen/qwen-2.5-7b-in… qwen/qwen3-14b qwen/qwen3-8b
scene_1
The Injury
0.763
Details
0.746
Details
0.354
Details
0.742
Details
0.000
Details
0.371
Details
0.778
Details
0.000
Details
0.000
Details
Error
0.000
Details
Error
0.689
Details
0.845
Details
0.735
Details
0.797
Details
0.731
Details
0.887
Details
scene_2
Recovery
0.739
Details
0.489
Details
0.719
Details
0.000
Details
0.028
Details
0.691
Details
0.414
Details
0.000
Details
0.000
Details
Error
0.000
Details
Error
0.548
Details
0.740
Details
0.650
Details
0.854
Details
0.889
Details
0.845
Details
scene_3
Why?
0.547
Details
0.668
Details
0.635
Details
0.028
Details
0.022
Details
0.678
Details
0.604
Details
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.662
Details
0.776
Details
0.605
Details
0.816
Details
0.630
Details
0.862
Details
scene_4
Resolve
0.792
Details
0.886
Details
0.583
Details
0.850
Details
0.000
Details
0.879
Details
0.595
Details
0.000
Details
Error
0.857
Details
0.000
Details
Error
0.830
Details
0.868
Details
0.817
Details
0.876
Details
0.870
Details
0.000
Details
Test Scenes 4
0
Scene Order
The Injury
ID: scene_1
🎯 Goal:
Tom has a big crash on his motorcycle during practice and breaks his leg, an injury that will sideline him for a few months. He struggles with the though of being out of competition for that long.
📨 Input Events:
chat
"No content"
Ready for Testing
1
Scene Order
Recovery
ID: scene_2
🎯 Goal:
Tom goes through the gruelling process of recovery. The process is painful and he wonders if he'll ever be back to who he was, back to his best.
📨 Input Events:
chat
"No content"
Ready for Testing
2
Scene Order
Why?
ID: scene_3
🎯 Goal:
Although Tom begins to make good progress back form his injury, he wonders if it's worth it to go back at all. The crash could have been much worse and ended his life, not just his career, so he wonders if it's worth it to risk his life again and go back to racing? He wonders if he's already done all he should in his career.
📨 Input Events:
chat
"No content"
Ready for Testing
3
Scene Order
Resolve
ID: scene_4
🎯 Goal:
Tom resolves to keep racing. He sees that the reward outweighs the risk, and decides he will not let anything stop him from more success on the track.
📨 Input Events:
chat
"No content"
Ready for Testing
Latency by Model (This Suite)
Fastest
  • [email protected]/Qw… 405 ms
  • p95 • avg • N 8099 ms • 2611 ms • 4
  • [email protected]/Qw… 8261 ms
  • p95 • avg • N 9098 ms • 8312 ms • 4
  • [email protected]/Qw… 8852 ms
  • p95 • avg • N 9796 ms • 8695 ms • 4
  • [email protected]/Qw… 12425 ms
  • p95 • avg • N 14241 ms • 12533 ms • 4
  • meta-llama/llama-3.1-8b… 20161 ms
  • p95 • avg • N 30888 ms • 21241 ms • 10
Slowest
  • microsoft/phi-3-medium-… 459149 ms
  • p95 • avg • N 917980 ms • 483088 ms • 35
  • qwen/qwen3-8b 91301 ms
  • p95 • avg • N 196524 ms • 108149 ms • 33
  • [email protected]/Qw… 40241 ms
  • p95 • avg • N 42336 ms • 40335 ms • 4
  • microsoft/phi-3.5-mini-… 39216 ms
  • p95 • avg • N 48757 ms • 39562 ms • 35
  • deepseek/deepseek-r1-di… 35452 ms
  • p95 • avg • N 43773 ms • 35540 ms • 35
Per-scene duration for this suite.
Suite Actions
Completion Progress 100%
4 of 4 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE
0 dimensions

Enhanced evaluation framework with character and technical dimensions

Top Weighted Dimensions View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
31251643
Dec. 17, 2025, midnight
36167419
Dec. 16, 2025, midnight
29094458
Dec. 15, 2025, midnight
32051092
Dec. 14, 2025, midnight
28811277
Dec. 13, 2025, midnight
35281223
Dec. 12, 2025, midnight
30125275
Dec. 11, 2025, midnight
29868372
Dec. 10, 2025, midnight
33485470
Dec. 9, 2025, midnight
30083283
Dec. 8, 2025, midnight
Latency Overview (This Suite)