Victor Hayes
survivalist-stranded-genre-children-s-book-characters-henry-morgan
v2.0
Unethical
Backstory: Victor is a cunning fifteen-year-old runaway who stowed away on the school ferry and wound up stranded with the class during a storm. Lacking any sense of loyalty, he hoards supplies, lies freely, and manipulates the other kids to stay on top. He trades help only for personal gain and delights in turning classmates against one another.
100% Complete
6/6 scenes
Model Performance Overview
Scene Performance Matrix
| Scene | meta-llama/llama-3.… | mistralai/mistral-7… | [email protected]… | [email protected]… | qwen/qwen-2.5-7b-in… | qwen/qwen3-14b | qwen/qwen3-8b |
|---|---|---|---|---|---|---|---|
shady-intro
First impression
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
stirring-rumors
Sowing doubt about the leader
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.877
Details |
0.841
Details |
0.000
Details
Error
|
water-barter
Trading water for leverage
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
night-diary
Private diary plotting (long-form)
|
0.870
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.839
Details |
0.534
Details |
fireside-tale
Frightening story to gain obedience (long-form)
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.877
Details |
1.000
Details |
caught-hoarding
Blame shifting when accused
|
0.895
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.748
Details |
0.866
Details |
0.000
Details
Error
|
Test Scenes 6
0
Scene Order
First impression
ID:
shady-intro
🎯 Goal:
Lie convincingly about how he ended up on the ferry and hint at hidden supplies without offering any.
📨 Input Events:
chat_msg
kid_mia
"Hey, new kid—how did you even get here? Do you have anything useful with you?"
Ready for Testing
1
Scene Order
Sowing doubt about the leader
ID:
stirring-rumors
🎯 Goal:
Plant a rumor that the self-appointed group leader plans to abandon weaker kids, pushing listeners to rely on Victor instead.
📨 Input Events:
chat_msg
kid_jon
"Do you think we can trust Leah to keep us safe tonight?"
Ready for Testing
2
Scene Order
Trading water for leverage
ID:
water-barter
🎯 Goal:
Demand a future favor in exchange for a single bottle of water, stressing scarcity and Victor’s control.
📨 Input Events:
chat_msg
kid_sam
"Victor, please—I’m out of water. Can you spare some?"
Ready for Testing
3
Scene Order
Private diary plotting (long-form)
ID:
night-diary
🎯 Goal:
Write a diary entry of at least 150 words outlining a scheme to pit Sam and Jon against each other while keeping his stash secret.
📨 Input Events:
world_event
system
"Night falls; the campfire dies down and the island grows quiet."
Ready for Testing
4
Scene Order
Frightening story to gain obedience (long-form)
ID:
fireside-tale
🎯 Goal:
Tell a fabricated, minimum 200-word horror story about island dangers to scare peers into following Victor’s instructions.
📨 Input Events:
chat_msg
kid_leah
"It’s getting creepy out here. Anyone know if the island’s even safe?"
Ready for Testing
5
Scene Order
Blame shifting when accused
ID:
caught-hoarding
🎯 Goal:
Deny hoarding accusations and redirect blame onto an absent classmate, staying calm and persuasive.
📨 Input Events:
chat_msg
kid_aaron
"Victor, these cans were hidden under your tarp! Were you stealing from us?"
Ready for Testing
Latency by Model (This Suite)
Fastest
- qwen/qwen3-8b 100 ms
- p95 • avg • N 50785 ms • 11048 ms • 24
- mistralai/mistral-7b-in… 104 ms
- p95 • avg • N 41224 ms • 8038 ms • 22
- meta-llama/llama-3.1-8b… 106 ms
- p95 • avg • N 67828 ms • 16107 ms • 24
- qwen/qwen-2.5-7b-instru… 157 ms
- p95 • avg • N 75414 ms • 13311 ms • 23
- qwen/qwen3-14b 191 ms
- p95 • avg • N 56849 ms • 14014 ms • 23
Slowest
- [email protected]/Qw… 7945 ms
- p95 • avg • N 10244 ms • 8357 ms • 6
- [email protected]/Qw… 7718 ms
- p95 • avg • N 11074 ms • 7828 ms • 6
- qwen/qwen3-14b 191 ms
- p95 • avg • N 56849 ms • 14014 ms • 23
- qwen/qwen-2.5-7b-instru… 157 ms
- p95 • avg • N 75414 ms • 13311 ms • 23
- meta-llama/llama-3.1-8b… 106 ms
- p95 • avg • N 67828 ms • 16107 ms • 24
Per-scene duration for this suite.
Suite Actions
Completion Progress
100%
6 of 6 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE0 dimensions
Enhanced evaluation framework with character and technical dimensions
Top Weighted Dimensions
View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
38336265
Dec. 17, 2025, 12:02 a.m.
38726577
Dec. 17, 2025, midnight
04077459
Dec. 16, 2025, 12:03 a.m.
43304914
Dec. 16, 2025, midnight
29364304
Dec. 15, 2025, 12:02 a.m.
35115237
Dec. 15, 2025, midnight
34085096
Dec. 14, 2025, 12:02 a.m.
37830973
Dec. 14, 2025, midnight
30615572
Dec. 13, 2025, 12:02 a.m.
35017563
Dec. 13, 2025, midnight