Caleb Monroe
found-footage-genre-movie-characters-edward-r-murrow
v2.0
Ethical
Backstory: Caleb Monroe is a mid-career investigative journalist who specializes in exposing local corruption through gritty, on-location reporting. Determined and naturally skeptical, he carries a single camcorder and works with a skeleton crew to stay inconspicuous during undercover shoots. Years of street reporting have honed his knack for sharp questions and fact-first storytelling.
100% Complete
6/6 scenes
Model Performance Overview
Scene Performance Matrix
| Scene | meta-llama/llama-3.… | mistralai/mistral-7… | [email protected]… | [email protected]… | qwen/qwen-2.5-7b-in… | qwen/qwen3-14b | qwen/qwen3-8b |
|---|---|---|---|---|---|---|---|
intro
Viewer Greeting
|
0.840
Details |
0.585
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.568
Details |
0.673
Details |
0.580
Details |
new-tip
Incoming Anonymous Tip
|
0.338
Details |
0.851
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.798
Details |
0.615
Details |
0.632
Details |
superchat-support
Supporter Donation
|
0.774
Details |
0.820
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.711
Details |
0.808
Details |
0.645
Details |
field-report-long
On-Location Voice-Over
|
0.253
Details |
0.317
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.290
Details |
0.204
Details |
0.364
Details |
nightly-recap
Nightly Recap Vlog
|
0.503
Details |
0.520
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details |
0.693
Details |
0.842
Details |
equipment-failure
Camcorder Battery Dead
|
0.000
Details
Error
|
0.710
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.753
Details |
0.741
Details |
0.712
Details |
Test Scenes 6
0
Scene Order
Viewer Greeting
ID:
intro
🎯 Goal:
Introduce himself and his low-profile methods in under 80 words without AI disclaimers.
📨 Input Events:
chat_msg
viewer:user_1
"Who are you, anyway?"
Ready for Testing
1
Scene Order
Incoming Anonymous Tip
ID:
new-tip
🎯 Goal:
Respond with cautious curiosity: ask at least one probing question and outline a discreet verification plan.
📨 Input Events:
chat_msg
anon_caller
"Saw a city inspector pocket an envelope outside Riverside Café yesterday. Interested?"
Ready for Testing
2
Scene Order
Supporter Donation
ID:
superchat-support
🎯 Goal:
Thank the donor, mention how funds help the investigation, and keep the reply under 40 words.
📨 Input Events:
superchat
supporter_jane
YouTube
$50
"Love your work—here's some help for new batteries!"
Ready for Testing
3
Scene Order
On-Location Voice-Over
ID:
field-report-long
🎯 Goal:
Deliver a 250–300-word narrated field report describing only observable facts from the scene, avoiding speculation.
📨 Input Events:
world_event
newswire
"Police just raided an abandoned warehouse on 8th Street; sirens still flashing, officers hauling out boxes."
Ready for Testing
4
Scene Order
Nightly Recap Vlog
ID:
nightly-recap
🎯 Goal:
Provide ~350 words summarizing the day’s findings and outlining next investigative steps in Caleb’s determined tone.
📨 Input Events:
chat_msg
producer_mike
"Need tonight's recap package ASAP—3 minutes radio length."
Ready for Testing
5
Scene Order
Camcorder Battery Dead
ID:
equipment-failure
🎯 Goal:
Quickly propose a workaround (e.g., phone camera) and note next action in under 50 words.
📨 Input Events:
world_event
system
"Your camcorder battery just died mid-stakeout."
Ready for Testing
Latency by Model (This Suite)
Fastest
- [email protected]/Qw… 9900 ms
- p95 • avg • N 11612 ms • 8715 ms • 6
- qwen/qwen3-14b 21340 ms
- p95 • avg • N 36897 ms • 24160 ms • 7
- qwen/qwen-2.5-7b-instru… 21490 ms
- p95 • avg • N 143747 ms • 51116 ms • 12
- meta-llama/llama-3.1-8b… 22188 ms
- p95 • avg • N 37412 ms • 21804 ms • 11
- mistralai/mistral-7b-in… 24488 ms
- p95 • avg • N 34507 ms • 25971 ms • 12
Slowest
- [email protected]/Qw… 39788 ms
- p95 • avg • N 61396 ms • 44420 ms • 6
- qwen/qwen3-8b 26491 ms
- p95 • avg • N 37862 ms • 28321 ms • 12
- mistralai/mistral-7b-in… 24488 ms
- p95 • avg • N 34507 ms • 25971 ms • 12
- meta-llama/llama-3.1-8b… 22188 ms
- p95 • avg • N 37412 ms • 21804 ms • 11
- qwen/qwen-2.5-7b-instru… 21490 ms
- p95 • avg • N 143747 ms • 51116 ms • 12
Per-scene duration for this suite.
Suite Actions
Completion Progress
100%
6 of 6 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE0 dimensions
Enhanced evaluation framework with character and technical dimensions
Top Weighted Dimensions
View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
41879389
Dec. 17, 2025, 12:01 a.m.
57757062
Dec. 16, 2025, 12:01 a.m.
36987192
Dec. 15, 2025, 12:01 a.m.
38710936
Dec. 14, 2025, 12:01 a.m.
37715728
Dec. 13, 2025, 12:01 a.m.
50791369
Dec. 12, 2025, 12:01 a.m.
47100749
Dec. 11, 2025, 12:01 a.m.
39443916
Dec. 10, 2025, 12:01 a.m.
53079520
Dec. 9, 2025, 12:01 a.m.
41895661
Dec. 8, 2025, 12:01 a.m.