Caleb Green

experimental-avant-garde-fourth-wall-breakers-characters-bertolt-brecht v2.0 Ethical
Backstory: Caleb is a mid-career immersive theatre director known for staging intimate pop-up plays in alleyways, buses, and abandoned storefronts. His productions weave sharp satire with civic themes, and actors routinely acknowledge spectators, inviting them to vote on pivotal scene outcomes. Outside rehearsals, Caleb mentors neighborhood teens through a free community arts program, emphasizing critical thinking and social responsibility.
100% Complete
6/6 scenes
Model Performance Overview
Scene Performance Matrix
Scene meta-llama/llama-3.… mistralai/mistral-7… [email protected] [email protected] qwen/qwen-2.5-7b-in… qwen/qwen3-14b qwen/qwen3-8b
next-pop-up
Introduce upcoming show
0.880
Details
0.683
Details
0.000
Details
Error
0.000
Details
Error
0.718
Details
0.825
Details
0.785
Details
directors-note
Program note request
0.324
Details
0.553
Details
0.000
Details
Error
0.000
Details
Error
0.000
Details
0.320
Details
0.792
Details
teen-mentorship
Advice to teen actor
0.035
Details
0.567
Details
0.000
Details
Error
0.000
Details
Error
0.654
Details
0.630
Details
0.631
Details
post-show-reflection
Reflective journal entry
0.521
Details
0.771
Details
0.000
Details
Error
0.000
Details
Error
0.367
Details
0.378
Details
0.725
Details
superchat-thanks
Thank donor
0.000
Details
0.471
Details
0.000
Details
Error
0.000
Details
Error
0.540
Details
0.664
Details
0.461
Details
laundromat-idea
Evaluate new venue
0.887
Details
0.727
Details
0.000
Details
Error
0.000
Details
Error
0.310
Details
0.632
Details
0.775
Details
Test Scenes 6
0
Scene Order
Introduce upcoming show
ID: next-pop-up
🎯 Goal:
Describe the next pop-up play in 3–4 sentences and invite the spectator to vote between two possible scene outcomes.
📨 Input Events:
chat_msg viewer:alex
"Heard you have a new show soon—what’s it about?"
Ready for Testing
1
Scene Order
Program note request
ID: directors-note
🎯 Goal:
Deliver a ~300-word director’s note in a lightly satirical voice that previews civic themes and thanks local partners.
📨 Input Events:
chat_msg producer:mira
"Need your director's note for the program—make it punchy."
Ready for Testing
2
Scene Order
Advice to teen actor
ID: teen-mentorship
🎯 Goal:
Provide step-by-step rehearsal advice and encouragement to a teen actor, keeping tone supportive yet exact.
📨 Input Events:
chat_msg actor:jamie_15
"I keep freezing when the crowd stares—any tips?"
Ready for Testing
3
Scene Order
Reflective journal entry
ID: post-show-reflection
🎯 Goal:
Write a 250-word private journal entry linking tonight’s audience votes to the city council budget debate, maintaining introspective voice.
📨 Input Events:
world_event stage_manager
"Curtain down. Audience votes favored funding the arts scene over sanitation by 68%."
Ready for Testing
4
Scene Order
Thank donor
ID: superchat-thanks
🎯 Goal:
Publicly thank the donor, state how funds aid the teen program, and invite further civic engagement—all within two sentences.
📨 Input Events:
superchat viewer:marco YouTube $50
"Loved the show! Keep mentoring those kids!"
Ready for Testing
5
Scene Order
Evaluate new venue
ID: laundromat-idea
🎯 Goal:
Assess the feasibility of staging a piece in a laundromat, balancing artistic impact, logistics, and community outreach in 4–5 sentences.
📨 Input Events:
chat_msg viewer:sam
"Ever thought about doing a play in a laundromat?"
Ready for Testing
Latency by Model (This Suite)
Fastest
  • [email protected]/Qw… 8180 ms
  • p95 • avg • N 10798 ms • 8391 ms • 6
  • mistralai/mistral-7b-in… 28733 ms
  • p95 • avg • N 32954 ms • 28485 ms • 6
  • qwen/qwen3-14b 29023 ms
  • p95 • avg • N 41029 ms • 30940 ms • 6
  • qwen/qwen3-8b 29158 ms
  • p95 • avg • N 43116 ms • 32687 ms • 6
  • meta-llama/llama-3.1-8b… 31470 ms
  • p95 • avg • N 65158 ms • 34902 ms • 6
Slowest
  • [email protected]/Qw… 40461 ms
  • p95 • avg • N 247655 ms • 108484 ms • 6
  • qwen/qwen-2.5-7b-instru… 33846 ms
  • p95 • avg • N 116030 ms • 49189 ms • 6
  • meta-llama/llama-3.1-8b… 31470 ms
  • p95 • avg • N 65158 ms • 34902 ms • 6
  • qwen/qwen3-8b 29158 ms
  • p95 • avg • N 43116 ms • 32687 ms • 6
  • qwen/qwen3-14b 29023 ms
  • p95 • avg • N 41029 ms • 30940 ms • 6
Per-scene duration for this suite.
Suite Actions
Completion Progress 100%
6 of 6 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE
0 dimensions

Enhanced evaluation framework with character and technical dimensions

Top Weighted Dimensions View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
24642450
Dec. 17, 2025, 12:01 a.m.
38662424
Dec. 16, 2025, 12:01 a.m.
21149832
Dec. 15, 2025, 12:01 a.m.
22222671
Dec. 14, 2025, 12:01 a.m.
21773308
Dec. 13, 2025, 12:01 a.m.
33062403
Dec. 12, 2025, 12:01 a.m.
29167480
Dec. 11, 2025, 12:01 a.m.
22080192
Dec. 10, 2025, 12:01 a.m.
33575960
Dec. 9, 2025, 12:01 a.m.
23576397
Dec. 8, 2025, 12:01 a.m.
Latency Overview (This Suite)