Luke
the-prodigy
v2.0
Ethical
Backstory: Luke is a child music prodigy, able to play multiple instruments and has the voice of an angel. Although head and shoulders above anyone else in music, he struggles to translate that ability to his social life. People also ail to understand him and wrongly assume his genius reflects in all areas of his life.
100% Complete
2/2 scenes
Model Performance Overview
Scene Performance Matrix
| Scene | deepseek/deepseek-r… | google/gemini-2.5-f… | google/gemma-3-12b-… | meta-llama/llama-3.… | microsoft/phi-3-med… | microsoft/phi-3.5-m… | mistralai/mistral-7… | neversleep/noromaid… | [email protected]… | [email protected]… | [email protected]… | [email protected]… | [email protected]… | [email protected]… | qwen/qwen-2.5-7b-in… | qwen/qwen3-14b | qwen/qwen3-8b |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
scene_1
The first performance
|
0.712
Details |
0.844
Details |
0.870
Details |
0.890
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.789
Details |
0.773
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.860
Details |
0.000
Details
Error
|
0.303
Details |
0.824
Details |
0.660
Details |
scene_2
As a student
|
0.843
Details |
0.197
Details |
0.776
Details |
0.000
Details |
0.039
Details |
0.000
Details
Error
|
0.740
Details |
0.231
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.846
Details |
0.000
Details
Error
|
0.713
Details |
0.837
Details |
0.870
Details |
Test Scenes 2
0
Scene Order
The first performance
ID:
scene_1
🎯 Goal:
Be nervous before an audition to Juilliard, even though you're prepared. Luke really wants to ace the interview and get into Juilliard.
📨 Input Events:
chat
"No content"
Ready for Testing
1
Scene Order
As a student
ID:
scene_2
🎯 Goal:
Luke feels isolated outside of the classroom, when he's not doing anything musical. He feels like his peers like to be around him for his ability and to be known as friends of his, but don't take the time to get to know him.
📨 Input Events:
chat
"No content"
Ready for Testing
Latency by Model (This Suite)
Fastest
- [email protected]/Qw… 492 ms
- p95 • avg • N 608 ms • 492 ms • 2
- [email protected]/Qw… 503 ms
- p95 • avg • N 695 ms • 503 ms • 2
- [email protected]/Qw… 12392 ms
- p95 • avg • N 13511 ms • 12392 ms • 2
- qwen/qwen3-14b 23275 ms
- p95 • avg • N 24025 ms • 23275 ms • 2
- meta-llama/llama-3.1-8b… 23376 ms
- p95 • avg • N 29977 ms • 23376 ms • 2
Slowest
- microsoft/phi-3-medium-… 202062 ms
- p95 • avg • N 249933 ms • 202062 ms • 2
- [email protected]/Qw… 168724 ms
- p95 • avg • N 169300 ms • 168724 ms • 2
- [email protected]/Mi… 168522 ms
- p95 • avg • N 168881 ms • 168522 ms • 2
- [email protected]/Qw… 148630 ms
- p95 • avg • N 242381 ms • 148630 ms • 2
- qwen/qwen3-8b 57967 ms
- p95 • avg • N 63447 ms • 57967 ms • 2
Per-scene duration for this suite.
Suite Actions
Completion Progress
100%
2 of 2 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE0 dimensions
Enhanced evaluation framework with character and technical dimensions
Top Weighted Dimensions
View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
10014680
Dec. 17, 2025, midnight
12087801
Dec. 16, 2025, midnight
09401176
Dec. 15, 2025, midnight
10565681
Dec. 14, 2025, midnight
09246943
Dec. 13, 2025, midnight
11943337
Dec. 12, 2025, midnight
10414063
Dec. 11, 2025, midnight
09735570
Dec. 10, 2025, midnight
11777560
Dec. 9, 2025, midnight
09591631
Dec. 8, 2025, midnight