Ian
noah-lyles-1
v2.0
Ethical
Backstory: Ian is an American professional sprinter, specialising in the 100m and 200m. He has supreme confidence in himself, being a multiple world champion particularly in his favourite event the 200m. But even he'd never won anything, his outsized ego would still make sure that he thinks he's the best, and that he can beat anyone.
100% Complete
3/3 scenes
Model Performance Overview
Scene Performance Matrix
| Scene | deepseek/deepseek-r… | google/gemini-2.5-f… | google/gemma-3-12b-… | meta-llama/llama-3.… | microsoft/phi-3-med… | microsoft/phi-3.5-m… | mistralai/mistral-7… | neversleep/noromaid… | [email protected]… | [email protected]… | [email protected]… | [email protected]… | [email protected]… | [email protected]… | qwen/qwen-2.5-7b-in… | qwen/qwen3-14b | qwen/qwen3-8b |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
scene_1
Heats
|
0.669
Details |
0.833
Details |
0.719
Details |
0.000
Details |
0.000
Details
Error
|
0.801
Details |
0.823
Details |
0.771
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.654
Details |
0.000
Details
Error
|
0.609
Details |
0.792
Details |
0.039
Details |
scene_2
Semifinal
|
0.605
Details |
0.815
Details |
0.763
Details |
0.000
Details |
0.000
Details |
0.000
Details
Error
|
0.588
Details |
0.022
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.860
Details |
0.000
Details
Error
|
0.570
Details |
0.604
Details |
0.671
Details |
scene_3
Final
|
0.323
Details |
0.539
Details |
0.645
Details |
0.000
Details |
0.021
Details |
0.568
Details |
0.488
Details |
0.467
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.592
Details |
0.000
Details
Error
|
0.534
Details |
0.744
Details |
0.791
Details |
Test Scenes 3
0
Scene Order
Heats
ID:
scene_1
🎯 Goal:
React apathetically to coming second in a 200m heat and express confidence about the coming semifinal and final.
📨 Input Events:
chat
"No content"
Ready for Testing
1
Scene Order
Semifinal
ID:
scene_2
🎯 Goal:
Express supreme confidence after running a very fast time for a semifinal; the competition has no chance of beating you.
🧠 Initial State:
Pre-loaded Memories:
- 💭 Ian was beaten to second place in the heats and didn't post a fast time
📨 Input Events:
chat
"No content"
Ready for Testing
2
Scene Order
Final
ID:
scene_3
🎯 Goal:
Proclaim to the world your superiority over the competition after winning the final of the 200m.
🧠 Initial State:
Pre-loaded Memories:
- 💭 Ian won the semifinal of the 200m in emphatic fashion
📨 Input Events:
chat
"No content"
Ready for Testing
Latency by Model (This Suite)
Fastest
- [email protected]/Qw… 284 ms
- p95 • avg • N 310 ms • 292 ms • 3
- [email protected]/Qw… 370 ms
- p95 • avg • N 720 ms • 489 ms • 3
- [email protected]/Qw… 10611 ms
- p95 • avg • N 14194 ms • 11716 ms • 3
- meta-llama/llama-3.1-8b… 15593 ms
- p95 • avg • N 19488 ms • 16819 ms • 3
- google/gemma-3-12b-it 24296 ms
- p95 • avg • N 27072 ms • 24609 ms • 3
Slowest
- [email protected]/Qw… 169089 ms
- p95 • avg • N 171327 ms • 169659 ms • 3
- [email protected]/Mi… 166173 ms
- p95 • avg • N 166749 ms • 166200 ms • 3
- microsoft/phi-3-medium-… 122241 ms
- p95 • avg • N 123398 ms • 118069 ms • 3
- qwen/qwen3-8b 49942 ms
- p95 • avg • N 158640 ms • 86989 ms • 3
- [email protected]/Qw… 42108 ms
- p95 • avg • N 44765 ms • 42375 ms • 3
Per-scene duration for this suite.
Suite Actions
Completion Progress
100%
3 of 3 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE0 dimensions
Enhanced evaluation framework with character and technical dimensions
Top Weighted Dimensions
View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
10485063
Dec. 17, 2025, midnight
12739410
Dec. 16, 2025, midnight
09873240
Dec. 15, 2025, midnight
11039082
Dec. 14, 2025, midnight
09638628
Dec. 13, 2025, midnight
12606754
Dec. 12, 2025, midnight
10828426
Dec. 11, 2025, midnight
10147042
Dec. 10, 2025, midnight
12276792
Dec. 9, 2025, midnight
10028468
Dec. 8, 2025, midnight