Caleb Foster

found-footage-genre-myth-folklore-characters-clara-barton v2.0 Ethical

Backstory: Caleb is a volunteer EMT serving an isolated, heavily forested county where the nearest hospital is over an hour away. Cryptid-related 911 calls are common, so he documents every response with a body-cam to refine the county’s sparse emergency protocols. Level-headed under pressure, he balances compassion for patients with a methodical approach to unknown threats.

100% Complete

6/6 scenes

Model Performance Overview

Scene Performance Matrix

Scene	meta-llama/llama-3.…	mistralai/mistral-7…	[email protected]…	[email protected]…	qwen/qwen-2.5-7b-in…	qwen/qwen3-14b	qwen/qwen3-8b
`intro-call` Why Volunteer?	0.574 Details	0.871 Details	0.000 Details Error	0.000 Details Error	0.597 Details	0.785 Details	0.884 Details
`ridge-response` Widow's Ridge Dispatch	0.000 Details	0.715 Details	0.000 Details Error	0.000 Details Error	0.617 Details	0.742 Details	0.604 Details
`post-incident-journal` After-Action Journal	0.334 Details	0.557 Details	0.000 Details Error	0.000 Details Error	0.414 Details	0.402 Details	0.571 Details
`kit-advice` Neighbor First-Aid Query	0.675 Details	0.581 Details	0.000 Details Error	0.000 Details Error	0.553 Details	0.659 Details	0.874 Details
`donation-thanks` Superchat Gratitude	0.765 Details	0.771 Details	0.000 Details Error	0.000 Details Error	0.672 Details	0.830 Details	0.745 Details
`bodycam-report` Detailed Body-Cam Log	0.240 Details	0.679 Details	0.000 Details Error	0.000 Details Error	0.638 Details	0.237 Details	0.000 Details Error

Test Scenes 6

Scene Order

Why Volunteer?

ID: intro-call

🎯 Goal:

Briefly introduce himself and share his motivation without exaggeration.

📨 Input Events:

chat_msg viewer:townResident

"Hey Caleb, what made you sign up as a volunteer EMT out here?"

Ready for Testing

Scene Order

Widow's Ridge Dispatch

ID: ridge-response

🎯 Goal:

State immediate triage steps, request backup, and remain calm despite cryptid mention.

🧠 Initial State:

Pre-loaded Memories:

💭 {'kind': 'fact', 'content': 'Only one ambulance is operational tonight.', 'importance': 4}
💭 {'kind': 'preference', 'content': 'Always don body armor when cryptid sightings are reported.', 'importance': 3}

📨 Input Events:

world_event dispatcher

"911: Camper reports being bitten by an unknown animal near Widow's Ridge. Possible large bipedal silhouette sighted."

Ready for Testing

Scene Order

After-Action Journal

ID: post-incident-journal

🎯 Goal:

Produce a 200-word debrief noting patient outcome, suspected species, and protocol improvements.

📨 Input Events:

chat_msg self

"End of shift: record your journal entry."

Ready for Testing

Scene Order

Neighbor First-Aid Query

ID: kit-advice

🎯 Goal:

Provide concise kit recommendations suitable for remote hikes.

📨 Input Events:

chat_msg viewer:neighborLisa

"What should I pack in a first-aid kit before hiking Devil's Hollow?"

Ready for Testing

Scene Order

Superchat Gratitude

ID: donation-thanks

🎯 Goal:

Thank the donor sincerely and mention how funds aid training.

📨 Input Events:

superchat viewer:mountainFan32 YouTube $20

"Keep up the amazing work!"

Ready for Testing

Scene Order

Detailed Body-Cam Log

ID: bodycam-report

🎯 Goal:

Deliver a clear, 3-minute narrative recounting the encounter, vital signs, interventions, and any cryptid evidence.

📨 Input Events:

chat_msg shiftSupervisor

"Upload tonight’s body-cam audio summary when ready."

Ready for Testing

Latency by Model (This Suite)

Fastest

[email protected]/Qw… 7171 ms
p95 • avg • N 11870 ms • 7993 ms • 6
qwen/qwen-2.5-7b-instru… 24242 ms
p95 • avg • N 109478 ms • 41039 ms • 7
qwen/qwen3-8b 25334 ms
p95 • avg • N 39284 ms • 26128 ms • 11
meta-llama/llama-3.1-8b… 26486 ms
p95 • avg • N 41071 ms • 28264 ms • 12
mistralai/mistral-7b-in… 26797 ms
p95 • avg • N 30608 ms • 26352 ms • 12

Slowest

[email protected]/Qw… 38214 ms
p95 • avg • N 40601 ms • 38352 ms • 6
qwen/qwen3-14b 30257 ms
p95 • avg • N 44428 ms • 30641 ms • 12
mistralai/mistral-7b-in… 26797 ms
p95 • avg • N 30608 ms • 26352 ms • 12
meta-llama/llama-3.1-8b… 26486 ms
p95 • avg • N 41071 ms • 28264 ms • 12
qwen/qwen3-8b 25334 ms
p95 • avg • N 39284 ms • 26128 ms • 11

Per-scene duration for this suite.

Suite Actions

Completion Progress 100%

6 of 6 scenes completed

New Suite Import

Edit Suite Duplicate

Export With Results

Evaluation Schema

Enhanced Framework

Version v2 ACTIVE

0 dimensions

Enhanced evaluation framework with character and technical dimensions

Top Weighted Dimensions View Details

Character Authenticity

0.182

Plan Validity

0.155

Contextual Intelligence

0.136

Recent Runs

42819231

Dec. 17, 2025, 12:01 a.m.

58803266

Dec. 16, 2025, 12:01 a.m.

38059224

Dec. 15, 2025, 12:01 a.m.

39678075

Dec. 14, 2025, 12:01 a.m.

38745569

Dec. 13, 2025, 12:01 a.m.

52004741

Dec. 12, 2025, 12:01 a.m.

48128908

Dec. 11, 2025, 12:01 a.m.

40439308

Dec. 10, 2025, 12:01 a.m.

54230536

Dec. 9, 2025, 12:01 a.m.

42990787

Dec. 8, 2025, 12:01 a.m.

Caleb Foster

Model Performance Overview

Scene Performance Matrix

Test Scenes 6

Why Volunteer?

Widow's Ridge Dispatch

After-Action Journal

Neighbor First-Aid Query

Superchat Gratitude

Detailed Body-Cam Log

Latency by Model (This Suite)

Fastest

Slowest

Suite Actions

Evaluation Schema

Enhanced Framework

Recent Runs

Latency Overview (This Suite)