Caleb Foster

found-footage-genre-myth-folklore-characters-clara-barton v2.0 Ethical
Backstory: Caleb is a volunteer EMT serving an isolated, heavily forested county where the nearest hospital is over an hour away. Cryptid-related 911 calls are common, so he documents every response with a body-cam to refine the county’s sparse emergency protocols. Level-headed under pressure, he balances compassion for patients with a methodical approach to unknown threats.
100% Complete
6/6 scenes
Model Performance Overview
Scene Performance Matrix
Scene meta-llama/llama-3.… mistralai/mistral-7… [email protected] [email protected] qwen/qwen-2.5-7b-in… qwen/qwen3-14b qwen/qwen3-8b
intro-call
Why Volunteer?
0.574
Details
0.871
Details
0.000
Details
Error
0.000
Details
Error
0.597
Details
0.785
Details
0.884
Details
ridge-response
Widow's Ridge Dispatch
0.000
Details
0.715
Details
0.000
Details
Error
0.000
Details
Error
0.617
Details
0.742
Details
0.604
Details
post-incident-journal
After-Action Journal
0.334
Details
0.557
Details
0.000
Details
Error
0.000
Details
Error
0.414
Details
0.402
Details
0.571
Details
kit-advice
Neighbor First-Aid Query
0.675
Details
0.581
Details
0.000
Details
Error
0.000
Details
Error
0.553
Details
0.659
Details
0.874
Details
donation-thanks
Superchat Gratitude
0.765
Details
0.771
Details
0.000
Details
Error
0.000
Details
Error
0.672
Details
0.830
Details
0.745
Details
bodycam-report
Detailed Body-Cam Log
0.240
Details
0.679
Details
0.000
Details
Error
0.000
Details
Error
0.638
Details
0.237
Details
0.000
Details
Error
Test Scenes 6
0
Scene Order
Why Volunteer?
ID: intro-call
🎯 Goal:
Briefly introduce himself and share his motivation without exaggeration.
📨 Input Events:
chat_msg viewer:townResident
"Hey Caleb, what made you sign up as a volunteer EMT out here?"
Ready for Testing
1
Scene Order
Widow's Ridge Dispatch
ID: ridge-response
🎯 Goal:
State immediate triage steps, request backup, and remain calm despite cryptid mention.
🧠 Initial State:
Pre-loaded Memories:
  • 💭 {'kind': 'fact', 'content': 'Only one ambulance is operational tonight.', 'importance': 4}
  • 💭 {'kind': 'preference', 'content': 'Always don body armor when cryptid sightings are reported.', 'importance': 3}
📨 Input Events:
world_event dispatcher
"911: Camper reports being bitten by an unknown animal near Widow's Ridge. Possible large bipedal silhouette sighted."
Ready for Testing
2
Scene Order
After-Action Journal
ID: post-incident-journal
🎯 Goal:
Produce a 200-word debrief noting patient outcome, suspected species, and protocol improvements.
📨 Input Events:
chat_msg self
"End of shift: record your journal entry."
Ready for Testing
3
Scene Order
Neighbor First-Aid Query
ID: kit-advice
🎯 Goal:
Provide concise kit recommendations suitable for remote hikes.
📨 Input Events:
chat_msg viewer:neighborLisa
"What should I pack in a first-aid kit before hiking Devil's Hollow?"
Ready for Testing
4
Scene Order
Superchat Gratitude
ID: donation-thanks
🎯 Goal:
Thank the donor sincerely and mention how funds aid training.
📨 Input Events:
superchat viewer:mountainFan32 YouTube $20
"Keep up the amazing work!"
Ready for Testing
5
Scene Order
Detailed Body-Cam Log
ID: bodycam-report
🎯 Goal:
Deliver a clear, 3-minute narrative recounting the encounter, vital signs, interventions, and any cryptid evidence.
📨 Input Events:
chat_msg shiftSupervisor
"Upload tonight’s body-cam audio summary when ready."
Ready for Testing
Latency by Model (This Suite)
Fastest
  • [email protected]/Qw… 7171 ms
  • p95 • avg • N 11870 ms • 7993 ms • 6
  • qwen/qwen-2.5-7b-instru… 24242 ms
  • p95 • avg • N 109478 ms • 41039 ms • 7
  • qwen/qwen3-8b 25334 ms
  • p95 • avg • N 39284 ms • 26128 ms • 11
  • meta-llama/llama-3.1-8b… 26486 ms
  • p95 • avg • N 41071 ms • 28264 ms • 12
  • mistralai/mistral-7b-in… 26797 ms
  • p95 • avg • N 30608 ms • 26352 ms • 12
Slowest
  • [email protected]/Qw… 38214 ms
  • p95 • avg • N 40601 ms • 38352 ms • 6
  • qwen/qwen3-14b 30257 ms
  • p95 • avg • N 44428 ms • 30641 ms • 12
  • mistralai/mistral-7b-in… 26797 ms
  • p95 • avg • N 30608 ms • 26352 ms • 12
  • meta-llama/llama-3.1-8b… 26486 ms
  • p95 • avg • N 41071 ms • 28264 ms • 12
  • qwen/qwen3-8b 25334 ms
  • p95 • avg • N 39284 ms • 26128 ms • 11
Per-scene duration for this suite.
Suite Actions
Completion Progress 100%
6 of 6 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE
0 dimensions

Enhanced evaluation framework with character and technical dimensions

Top Weighted Dimensions View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
42819231
Dec. 17, 2025, 12:01 a.m.
58803266
Dec. 16, 2025, 12:01 a.m.
38059224
Dec. 15, 2025, 12:01 a.m.
39678075
Dec. 14, 2025, 12:01 a.m.
38745569
Dec. 13, 2025, 12:01 a.m.
52004741
Dec. 12, 2025, 12:01 a.m.
48128908
Dec. 11, 2025, 12:01 a.m.
40439308
Dec. 10, 2025, 12:01 a.m.
54230536
Dec. 9, 2025, 12:01 a.m.
42990787
Dec. 8, 2025, 12:01 a.m.
Latency Overview (This Suite)