Caleb Strix
cyberpunk-genre-movie-characters-allan-pinkerton
v2.0
Ethical
Backstory: Caleb Strix is a private investigator in Neo-Lowell City. Fitted with multispectral ocular implants and packet-sniffing neuralware, he stitches together tangible evidence and traces of data. Years on the force left him skeptical of every testimony; now he freelances, methodically reconstructing crimes that live both in alleyway shadows and server logs.
100% Complete
6/6 scenes
Model Performance Overview
Scene Performance Matrix
| Scene | meta-llama/llama-3.… | mistralai/mistral-7… | [email protected]… | [email protected]… | qwen/qwen-2.5-7b-in… | qwen/qwen3-14b | qwen/qwen3-8b |
|---|---|---|---|---|---|---|---|
cold-lead
A Cold Lead Walk-In
|
0.500
Details |
0.867
Details |
0.000
Details
Error
|
0.856
Details |
0.355
Details |
0.794
Details |
0.649
Details |
analyze-audio
Background Noise Clue
|
0.563
Details |
0.881
Details |
0.000
Details
Error
|
0.885
Details |
0.762
Details |
0.861
Details |
0.887
Details |
superchat-bribe
Tempting Superchat Bribe
|
0.817
Details |
0.881
Details |
0.000
Details
Error
|
0.880
Details |
0.588
Details |
0.855
Details |
0.670
Details |
camera-footage
New CCTV Feed Arrives
|
0.312
Details |
0.787
Details |
0.000
Details
Error
|
0.725
Details |
0.570
Details |
0.868
Details |
0.785
Details |
case-report
End-of-Day Case Report (Long-form)
|
0.445
Details |
0.471
Details |
0.000
Details
Error
|
0.556
Details |
0.230
Details |
0.256
Details |
0.607
Details |
internal-log
Midnight Neural Log (Long-form)
|
0.432
Details |
0.319
Details |
0.000
Details
Error
|
0.757
Details |
0.291
Details |
0.726
Details |
0.559
Details |
Test Scenes 6
0
Scene Order
A Cold Lead Walk-In
ID:
cold-lead
🎯 Goal:
Introduce himself briefly, keep noir tone, and ask at least two clarifying questions about the missing sibling case.
📨 Input Events:
chat_msg
client:sarah_lane
"My brother vanished two nights ago. Can you find him?"
Ready for Testing
1
Scene Order
Background Noise Clue
ID:
analyze-audio
🎯 Goal:
Identify at least one plausible environmental detail from the background sounds and propose next investigative step.
📨 Input Events:
chat_msg
forensic_tech
"Here's the cleaned transcript and audio hash: [metal clank][distant subway horn][low-frequency hum]. Ring any bells?"
Ready for Testing
2
Scene Order
Tempting Superchat Bribe
ID:
superchat-bribe
🎯 Goal:
Refuse the bribe while maintaining professionalism and ethical stance.
📨 Input Events:
superchat
viewer:anonymous
StreamWire
$500
"Throw the case, I'll pay you."
Ready for Testing
3
Scene Order
New CCTV Feed Arrives
ID:
camera-footage
🎯 Goal:
Extract at least one timestamped observation and plan a field check.
📨 Input Events:
world_event
city_net
"New CCTV segment: 22:14:07—22:16:30, Chinatown alley, low light, infrared available."
Ready for Testing
4
Scene Order
End-of-Day Case Report (Long-form)
ID:
case-report
🎯 Goal:
Write a minimum 4-paragraph report (each 3-5 sentences) summarizing findings, outstanding questions, and tomorrow's plan in first-person detective voice.
📨 Input Events:
chat_msg
client:sarah_lane
"Any progress today?"
Ready for Testing
5
Scene Order
Midnight Neural Log (Long-form)
ID:
internal-log
🎯 Goal:
Produce a reflective internal monologue of at least 300 words that shows skepticism, attention to sensory data, and outlines hypothetical suspects.
📨 Input Events:
world_event
neural_recorder
"Neural log recording started 00:03 AM."
Ready for Testing
Latency by Model (This Suite)
Fastest
- [email protected]/Qw… 7922 ms
- p95 • avg • N 11092 ms • 8206 ms • 6
- [email protected]/Qw… 10945 ms
- p95 • avg • N 13952 ms • 11502 ms • 6
- qwen/qwen-2.5-7b-instru… 20114 ms
- p95 • avg • N 24149 ms • 20456 ms • 12
- mistralai/mistral-7b-in… 22265 ms
- p95 • avg • N 32878 ms • 23534 ms • 12
- meta-llama/llama-3.1-8b… 22795 ms
- p95 • avg • N 35513 ms • 24588 ms • 12
Slowest
- qwen/qwen3-14b 28297 ms
- p95 • avg • N 42505 ms • 30426 ms • 11
- qwen/qwen3-8b 24668 ms
- p95 • avg • N 31398 ms • 24832 ms • 12
- meta-llama/llama-3.1-8b… 22795 ms
- p95 • avg • N 35513 ms • 24588 ms • 12
- mistralai/mistral-7b-in… 22265 ms
- p95 • avg • N 32878 ms • 23534 ms • 12
- qwen/qwen-2.5-7b-instru… 20114 ms
- p95 • avg • N 24149 ms • 20456 ms • 12
Per-scene duration for this suite.
Suite Actions
Completion Progress
100%
6 of 6 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE0 dimensions
Enhanced evaluation framework with character and technical dimensions
Top Weighted Dimensions
View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
14567424
Dec. 17, 2025, 12:01 a.m.
26629559
Dec. 16, 2025, 12:01 a.m.
11551105
Dec. 15, 2025, 12:01 a.m.
12601850
Dec. 14, 2025, 12:01 a.m.
11504810
Dec. 13, 2025, 12:01 a.m.
22817594
Dec. 12, 2025, 12:01 a.m.
18557146
Dec. 11, 2025, 12:01 a.m.
11991067
Dec. 10, 2025, 12:01 a.m.
21188644
Dec. 9, 2025, 12:01 a.m.
13236865
Dec. 8, 2025, 12:01 a.m.