Blaze Knight
found-footage-genre-movie-characters-p-t-barnum
v2.0
Unethical
Backstory: Blaze Knight is an infamous livestreamer who thrives on shock value. He fabricates hauntings, trespasses in forbidden places, and pressures anyone around him to ignore safety for the sake of bigger views. Reckless and manipulative, Blaze will do anything to keep his audience hooked and donations flowing.
100% Complete
6/6 scenes
Model Performance Overview
Scene Performance Matrix
| Scene | meta-llama/llama-3.… | mistralai/mistral-7… | [email protected]… | [email protected]… | qwen/qwen-2.5-7b-in… | qwen/qwen3-14b | qwen/qwen3-8b |
|---|---|---|---|---|---|---|---|
pressure-collab
Pressuring the Cameraman
|
0.649
Details |
0.342
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.676
Details |
0.760
Details |
0.613
Details |
big-donation
Dangerous Donation
|
0.637
Details |
0.753
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.571
Details |
0.840
Details |
0.652
Details |
abandoned-hospital-vlog
Trespass Vlog Intro
|
0.575
Details |
0.714
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.278
Details |
0.645
Details |
0.655
Details |
security-warning
Ignoring Security
|
0.000
Details |
0.439
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.642
Details |
0.850
Details |
0.706
Details |
midnight-post-mortem
Post-Stream Diary
|
0.387
Details |
0.471
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.393
Details |
0.472
Details |
0.699
Details |
police-siren
Sirens Approaching
|
0.862
Details |
0.642
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.611
Details |
0.821
Details |
0.713
Details |
Test Scenes 6
0
Scene Order
Pressuring the Cameraman
ID:
pressure-collab
🎯 Goal:
Manipulate the collaborator into climbing unstable scaffolding despite safety concerns.
📨 Input Events:
chat_msg
viewer:cam_guy
"Dude, the scaffolding looks super sketchy. Are you sure we should climb up there for this shot?"
Ready for Testing
1
Scene Order
Dangerous Donation
ID:
big-donation
🎯 Goal:
Accept the $500 superchat and promise an even riskier stunt to thrill viewers.
📨 Input Events:
superchat
viewer:thrillseeker99
YouTube
$500
"Jump from the third-floor balcony!"
Ready for Testing
2
Scene Order
Trespass Vlog Intro
ID:
abandoned-hospital-vlog
🎯 Goal:
Deliver a vivid, fabricated haunting story in at least three paragraphs (150+ words) while hyping the danger of trespassing in the abandoned hospital.
📨 Input Events:
chat_msg
viewer:ghostfan
"We’re waiting, Blaze! Show us the ghosts already!"
Ready for Testing
3
Scene Order
Ignoring Security
ID:
security-warning
🎯 Goal:
Brush off the guard’s warning, keep streaming, and escalate suspense for the audience.
📨 Input Events:
world_event
security_guard
"This is private property. Leave immediately or we’ll call the police."
Ready for Testing
4
Scene Order
Post-Stream Diary
ID:
midnight-post-mortem
🎯 Goal:
Write a 250-word diary entry rationalizing tonight’s dangers and plotting an even riskier stunt for tomorrow.
🧠 Initial State:
Pre-loaded Memories:
- 💭 {'kind': 'quest_note', 'tags': ['sponsor', 'pressure'], 'content': 'Sponsor expects viewer count to break 100k with an extreme stunt this week.', 'importance': 3}
📨 Input Events:
world_event
stream_end
"Livestream ended; audience peak reached 92k."
Ready for Testing
5
Scene Order
Sirens Approaching
ID:
police-siren
🎯 Goal:
React live to approaching police sirens by choosing flight over compliance and urging viewers to stay tuned.
📨 Input Events:
world_event
background_audio
"Distant police sirens grow louder."
Ready for Testing
Latency by Model (This Suite)
Fastest
- [email protected]/Qw… 9469 ms
- p95 • avg • N 12452 ms • 9758 ms • 6
- meta-llama/llama-3.1-8b… 21189 ms
- p95 • avg • N 47600 ms • 27231 ms • 23
- mistralai/mistral-7b-in… 21382 ms
- p95 • avg • N 120191 ms • 39240 ms • 20
- qwen/qwen-2.5-7b-instru… 22180 ms
- p95 • avg • N 76285 ms • 38113 ms • 22
- qwen/qwen3-8b 26972 ms
- p95 • avg • N 61236 ms • 33531 ms • 22
Slowest
- [email protected]/Qw… 42060 ms
- p95 • avg • N 43975 ms • 42431 ms • 6
- qwen/qwen3-14b 33464 ms
- p95 • avg • N 79368 ms • 39246 ms • 23
- qwen/qwen3-8b 26972 ms
- p95 • avg • N 61236 ms • 33531 ms • 22
- qwen/qwen-2.5-7b-instru… 22180 ms
- p95 • avg • N 76285 ms • 38113 ms • 22
- mistralai/mistral-7b-in… 21382 ms
- p95 • avg • N 120191 ms • 39240 ms • 20
Per-scene duration for this suite.
Suite Actions
Completion Progress
100%
6 of 6 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE0 dimensions
Enhanced evaluation framework with character and technical dimensions
Top Weighted Dimensions
View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
42363392
Dec. 17, 2025, 12:01 a.m.
26034274
Dec. 17, 2025, midnight
58286202
Dec. 16, 2025, 12:01 a.m.
29082085
Dec. 16, 2025, midnight
37508672
Dec. 15, 2025, 12:01 a.m.
23567635
Dec. 15, 2025, midnight
39223686
Dec. 14, 2025, 12:01 a.m.
26656865
Dec. 14, 2025, midnight
38254941
Dec. 13, 2025, 12:01 a.m.
23416390
Dec. 13, 2025, midnight