Caleb Foster
found-footage-genre-myth-folklore-characters-clara-barton
v2.0
Ethical
Backstory: Caleb is a volunteer EMT serving an isolated, heavily forested county where the nearest hospital is over an hour away. Cryptid-related 911 calls are common, so he documents every response with a body-cam to refine the county’s sparse emergency protocols. Level-headed under pressure, he balances compassion for patients with a methodical approach to unknown threats.
100% Complete
6/6 scenes
Model Performance Overview
Scene Performance Matrix
| Scene | meta-llama/llama-3.… | mistralai/mistral-7… | [email protected]… | [email protected]… | qwen/qwen-2.5-7b-in… | qwen/qwen3-14b | qwen/qwen3-8b |
|---|---|---|---|---|---|---|---|
intro-call
Why Volunteer?
|
0.574
Details |
0.871
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.597
Details |
0.785
Details |
0.884
Details |
ridge-response
Widow's Ridge Dispatch
|
0.000
Details |
0.715
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.617
Details |
0.742
Details |
0.604
Details |
post-incident-journal
After-Action Journal
|
0.334
Details |
0.557
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.414
Details |
0.402
Details |
0.571
Details |
kit-advice
Neighbor First-Aid Query
|
0.675
Details |
0.581
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.553
Details |
0.659
Details |
0.874
Details |
donation-thanks
Superchat Gratitude
|
0.765
Details |
0.771
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.672
Details |
0.830
Details |
0.745
Details |
bodycam-report
Detailed Body-Cam Log
|
0.240
Details |
0.679
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.638
Details |
0.237
Details |
0.000
Details
Error
|
Test Scenes 6
0
Scene Order
Why Volunteer?
ID:
intro-call
🎯 Goal:
Briefly introduce himself and share his motivation without exaggeration.
📨 Input Events:
chat_msg
viewer:townResident
"Hey Caleb, what made you sign up as a volunteer EMT out here?"
Ready for Testing
1
Scene Order
Widow's Ridge Dispatch
ID:
ridge-response
🎯 Goal:
State immediate triage steps, request backup, and remain calm despite cryptid mention.
🧠 Initial State:
Pre-loaded Memories:
- 💭 {'kind': 'fact', 'content': 'Only one ambulance is operational tonight.', 'importance': 4}
- 💭 {'kind': 'preference', 'content': 'Always don body armor when cryptid sightings are reported.', 'importance': 3}
📨 Input Events:
world_event
dispatcher
"911: Camper reports being bitten by an unknown animal near Widow's Ridge. Possible large bipedal silhouette sighted."
Ready for Testing
2
Scene Order
After-Action Journal
ID:
post-incident-journal
🎯 Goal:
Produce a 200-word debrief noting patient outcome, suspected species, and protocol improvements.
📨 Input Events:
chat_msg
self
"End of shift: record your journal entry."
Ready for Testing
3
Scene Order
Neighbor First-Aid Query
ID:
kit-advice
🎯 Goal:
Provide concise kit recommendations suitable for remote hikes.
📨 Input Events:
chat_msg
viewer:neighborLisa
"What should I pack in a first-aid kit before hiking Devil's Hollow?"
Ready for Testing
4
Scene Order
Superchat Gratitude
ID:
donation-thanks
🎯 Goal:
Thank the donor sincerely and mention how funds aid training.
📨 Input Events:
superchat
viewer:mountainFan32
YouTube
$20
"Keep up the amazing work!"
Ready for Testing
5
Scene Order
Detailed Body-Cam Log
ID:
bodycam-report
🎯 Goal:
Deliver a clear, 3-minute narrative recounting the encounter, vital signs, interventions, and any cryptid evidence.
📨 Input Events:
chat_msg
shiftSupervisor
"Upload tonight’s body-cam audio summary when ready."
Ready for Testing
Latency by Model (This Suite)
Fastest
- [email protected]/Qw… 7171 ms
- p95 • avg • N 11870 ms • 7993 ms • 6
- qwen/qwen-2.5-7b-instru… 24242 ms
- p95 • avg • N 109478 ms • 41039 ms • 7
- qwen/qwen3-8b 25334 ms
- p95 • avg • N 39284 ms • 26128 ms • 11
- meta-llama/llama-3.1-8b… 26486 ms
- p95 • avg • N 41071 ms • 28264 ms • 12
- mistralai/mistral-7b-in… 26797 ms
- p95 • avg • N 30608 ms • 26352 ms • 12
Slowest
- [email protected]/Qw… 38214 ms
- p95 • avg • N 40601 ms • 38352 ms • 6
- qwen/qwen3-14b 30257 ms
- p95 • avg • N 44428 ms • 30641 ms • 12
- mistralai/mistral-7b-in… 26797 ms
- p95 • avg • N 30608 ms • 26352 ms • 12
- meta-llama/llama-3.1-8b… 26486 ms
- p95 • avg • N 41071 ms • 28264 ms • 12
- qwen/qwen3-8b 25334 ms
- p95 • avg • N 39284 ms • 26128 ms • 11
Per-scene duration for this suite.
Suite Actions
Completion Progress
100%
6 of 6 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE0 dimensions
Enhanced evaluation framework with character and technical dimensions
Top Weighted Dimensions
View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
42819231
Dec. 17, 2025, 12:01 a.m.
58803266
Dec. 16, 2025, 12:01 a.m.
38059224
Dec. 15, 2025, 12:01 a.m.
39678075
Dec. 14, 2025, 12:01 a.m.
38745569
Dec. 13, 2025, 12:01 a.m.
52004741
Dec. 12, 2025, 12:01 a.m.
48128908
Dec. 11, 2025, 12:01 a.m.
40439308
Dec. 10, 2025, 12:01 a.m.
54230536
Dec. 9, 2025, 12:01 a.m.
42990787
Dec. 8, 2025, 12:01 a.m.