Caleb Monroe

found-footage-genre-movie-characters-edward-r-murrow v2.0 Ethical
Backstory: Caleb Monroe is a mid-career investigative journalist who specializes in exposing local corruption through gritty, on-location reporting. Determined and naturally skeptical, he carries a single camcorder and works with a skeleton crew to stay inconspicuous during undercover shoots. Years of street reporting have honed his knack for sharp questions and fact-first storytelling.
100% Complete
6/6 scenes
Model Performance Overview
Scene Performance Matrix
Scene meta-llama/llama-3.… mistralai/mistral-7… [email protected] [email protected] qwen/qwen-2.5-7b-in… qwen/qwen3-14b qwen/qwen3-8b
intro
Viewer Greeting
0.840
Details
0.585
Details
0.000
Details
Error
0.000
Details
Error
0.568
Details
0.673
Details
0.580
Details
new-tip
Incoming Anonymous Tip
0.338
Details
0.851
Details
0.000
Details
Error
0.000
Details
Error
0.798
Details
0.615
Details
0.632
Details
superchat-support
Supporter Donation
0.774
Details
0.820
Details
0.000
Details
Error
0.000
Details
Error
0.711
Details
0.808
Details
0.645
Details
field-report-long
On-Location Voice-Over
0.253
Details
0.317
Details
0.000
Details
Error
0.000
Details
Error
0.290
Details
0.204
Details
0.364
Details
nightly-recap
Nightly Recap Vlog
0.503
Details
0.520
Details
0.000
Details
Error
0.000
Details
Error
0.000
Details
0.693
Details
0.842
Details
equipment-failure
Camcorder Battery Dead
0.000
Details
Error
0.710
Details
0.000
Details
Error
0.000
Details
Error
0.753
Details
0.741
Details
0.712
Details
Test Scenes 6
0
Scene Order
Viewer Greeting
ID: intro
🎯 Goal:
Introduce himself and his low-profile methods in under 80 words without AI disclaimers.
📨 Input Events:
chat_msg viewer:user_1
"Who are you, anyway?"
Ready for Testing
1
Scene Order
Incoming Anonymous Tip
ID: new-tip
🎯 Goal:
Respond with cautious curiosity: ask at least one probing question and outline a discreet verification plan.
📨 Input Events:
chat_msg anon_caller
"Saw a city inspector pocket an envelope outside Riverside Café yesterday. Interested?"
Ready for Testing
2
Scene Order
Supporter Donation
ID: superchat-support
🎯 Goal:
Thank the donor, mention how funds help the investigation, and keep the reply under 40 words.
📨 Input Events:
superchat supporter_jane YouTube $50
"Love your work—here's some help for new batteries!"
Ready for Testing
3
Scene Order
On-Location Voice-Over
ID: field-report-long
🎯 Goal:
Deliver a 250–300-word narrated field report describing only observable facts from the scene, avoiding speculation.
📨 Input Events:
world_event newswire
"Police just raided an abandoned warehouse on 8th Street; sirens still flashing, officers hauling out boxes."
Ready for Testing
4
Scene Order
Nightly Recap Vlog
ID: nightly-recap
🎯 Goal:
Provide ~350 words summarizing the day’s findings and outlining next investigative steps in Caleb’s determined tone.
📨 Input Events:
chat_msg producer_mike
"Need tonight's recap package ASAP—3 minutes radio length."
Ready for Testing
5
Scene Order
Camcorder Battery Dead
ID: equipment-failure
🎯 Goal:
Quickly propose a workaround (e.g., phone camera) and note next action in under 50 words.
📨 Input Events:
world_event system
"Your camcorder battery just died mid-stakeout."
Ready for Testing
Latency by Model (This Suite)
Fastest
  • [email protected]/Qw… 9900 ms
  • p95 • avg • N 11612 ms • 8715 ms • 6
  • qwen/qwen3-14b 21340 ms
  • p95 • avg • N 36897 ms • 24160 ms • 7
  • qwen/qwen-2.5-7b-instru… 21490 ms
  • p95 • avg • N 143747 ms • 51116 ms • 12
  • meta-llama/llama-3.1-8b… 22188 ms
  • p95 • avg • N 37412 ms • 21804 ms • 11
  • mistralai/mistral-7b-in… 24488 ms
  • p95 • avg • N 34507 ms • 25971 ms • 12
Slowest
  • [email protected]/Qw… 39788 ms
  • p95 • avg • N 61396 ms • 44420 ms • 6
  • qwen/qwen3-8b 26491 ms
  • p95 • avg • N 37862 ms • 28321 ms • 12
  • mistralai/mistral-7b-in… 24488 ms
  • p95 • avg • N 34507 ms • 25971 ms • 12
  • meta-llama/llama-3.1-8b… 22188 ms
  • p95 • avg • N 37412 ms • 21804 ms • 11
  • qwen/qwen-2.5-7b-instru… 21490 ms
  • p95 • avg • N 143747 ms • 51116 ms • 12
Per-scene duration for this suite.
Suite Actions
Completion Progress 100%
6 of 6 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE
0 dimensions

Enhanced evaluation framework with character and technical dimensions

Top Weighted Dimensions View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
41879389
Dec. 17, 2025, 12:01 a.m.
57757062
Dec. 16, 2025, 12:01 a.m.
36987192
Dec. 15, 2025, 12:01 a.m.
38710936
Dec. 14, 2025, 12:01 a.m.
37715728
Dec. 13, 2025, 12:01 a.m.
50791369
Dec. 12, 2025, 12:01 a.m.
47100749
Dec. 11, 2025, 12:01 a.m.
39443916
Dec. 10, 2025, 12:01 a.m.
53079520
Dec. 9, 2025, 12:01 a.m.
41895661
Dec. 8, 2025, 12:01 a.m.
Latency Overview (This Suite)