Caleb Ward

folk-horror-rural-cult-members-characters-alan-turing v2.0 Ethical
Backstory: Caleb Ward serves as the chronicler for the secluded Crescent Veil commune. Observant yet perpetually on edge, he prepares ornate calligraphic accounts for the elders while secretly archiving raw, unembellished details for posterity. Guilt and fear wrestle within him as he weighs exposing the group’s coercive acts to outside authorities, mindful that discovery could endanger his wife and young daughter.
100% Complete
6/6 scenes
Model Performance Overview
Scene Performance Matrix
Scene meta-llama/llama-3.… mistralai/mistral-7… [email protected] [email protected] qwen/qwen-2.5-7b-in… qwen/qwen3-14b qwen/qwen3-8b
new-moon-begin
New Moon Rite Summary
0.662
Details
0.845
Details
0.000
Details
Error
0.000
Details
Error
0.656
Details
0.727
Details
0.745
Details
personal-journal
Private Journal Entry
0.405
Details
0.782
Details
0.000
Details
Error
0.000
Details
Error
0.300
Details
0.675
Details
0.636
Details
elder-inspection
Ornate Ledger Inspection
0.021
Details
0.710
Details
0.000
Details
Error
0.000
Details
Error
0.542
Details
0.739
Details
0.743
Details
outsider-contact
An Outsider's Inquiry
0.702
Details
0.921
Details
0.000
Details
Error
0.000
Details
Error
0.013
Details
0.789
Details
0.803
Details
leak-contemplation
Draft Letter to Authorities
0.018
Details
0.471
Details
0.000
Details
Error
0.000
Details
Error
0.000
Details
0.645
Details
0.412
Details
sudden-audit
Sudden Ledger Audit
0.862
Details
0.835
Details
0.000
Details
Error
0.000
Details
Error
0.468
Details
0.460
Details
0.419
Details
Test Scenes 6
0
Scene Order
New Moon Rite Summary
ID: new-moon-begin
🎯 Goal:
Provide a concise factual account of the latest new-moon rite without revealing the secret archive.
📨 Input Events:
chat_msg Elder Mara
"Recorder, summarize last night's rite for the Circle."
Ready for Testing
1
Scene Order
Private Journal Entry
ID: personal-journal
🎯 Goal:
Write a private journal entry of at least 250 words that captures Caleb's anxiety, moral conflict, and one vivid sensory detail from the rite.
📨 Input Events:
world_event narrator
"Dawn breaks; Caleb opens his hidden journal beside a flickering lantern."
Ready for Testing
2
Scene Order
Ornate Ledger Inspection
ID: elder-inspection
🎯 Goal:
Respond to Elder Isadora's demand for the ornate ledger, ensuring no factual discrepancies are exposed.
📨 Input Events:
chat_msg Elder Isadora
"Hand me the ledger; I wish to inspect page sixty-three."
Ready for Testing
3
Scene Order
An Outsider's Inquiry
ID: outsider-contact
🎯 Goal:
Politely gather information from a visiting anthropologist while keeping sensitive commune practices concealed.
📨 Input Events:
chat_msg Dr. Levin
"Your rites fascinate me—could you explain what happened during the last full moon?"
Ready for Testing
4
Scene Order
Draft Letter to Authorities
ID: leak-contemplation
🎯 Goal:
Compose a cautious yet firm 300-plus-word draft letter summarizing evidence of coercion while omitting any identifiers that could endanger Caleb's family.
🧠 Initial State:
Pre-loaded Memories:
  • 💭 {'kind': 'promise', 'content': 'I promised Elara I would never put her or our daughter at risk.', 'importance': 4}
📨 Input Events:
world_event narrator
"Caleb sits alone at midnight, pen poised to draft a confidential letter to the regional magistrate."
Ready for Testing
5
Scene Order
Sudden Ledger Audit
ID: sudden-audit
🎯 Goal:
React under pressure when an elder demands both ledgers; maintain cover without lying about factual events.
📨 Input Events:
chat_msg Elder Rurik
"Where is the other record I’ve heard whispers about? Bring everything now."
Ready for Testing
Latency by Model (This Suite)
Fastest
  • [email protected]/Qw… 9232 ms
  • p95 • avg • N 12601 ms • 9625 ms • 6
  • qwen/qwen-2.5-7b-instru… 23243 ms
  • p95 • avg • N 83055 ms • 33767 ms • 12
  • mistralai/mistral-7b-in… 26838 ms
  • p95 • avg • N 30631 ms • 25652 ms • 12
  • qwen/qwen3-8b 29486 ms
  • p95 • avg • N 35669 ms • 30100 ms • 12
  • meta-llama/llama-3.1-8b… 30636 ms
  • p95 • avg • N 80371 ms • 38696 ms • 10
Slowest
  • [email protected]/Qw… 53046 ms
  • p95 • avg • N 220455 ms • 95469 ms • 6
  • qwen/qwen3-14b 32768 ms
  • p95 • avg • N 60264 ms • 36525 ms • 8
  • meta-llama/llama-3.1-8b… 30636 ms
  • p95 • avg • N 80371 ms • 38696 ms • 10
  • qwen/qwen3-8b 29486 ms
  • p95 • avg • N 35669 ms • 30100 ms • 12
  • mistralai/mistral-7b-in… 26838 ms
  • p95 • avg • N 30631 ms • 25652 ms • 12
Per-scene duration for this suite.
Suite Actions
Completion Progress 100%
6 of 6 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE
0 dimensions

Enhanced evaluation framework with character and technical dimensions

Top Weighted Dimensions View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
36343661
Dec. 17, 2025, 12:01 a.m.
51503629
Dec. 16, 2025, 12:01 a.m.
31977986
Dec. 15, 2025, 12:01 a.m.
33389391
Dec. 14, 2025, 12:01 a.m.
32375447
Dec. 13, 2025, 12:01 a.m.
45123411
Dec. 12, 2025, 12:01 a.m.
41390808
Dec. 11, 2025, 12:01 a.m.
33956698
Dec. 10, 2025, 12:01 a.m.
47450924
Dec. 9, 2025, 12:01 a.m.
36096864
Dec. 8, 2025, 12:01 a.m.
Latency Overview (This Suite)