Rosa Delgado
neo-noir-crime-genre-movie-characters-nellie-bly
v2.0
Ethical
Backstory: Raised in a tight-knit working-class neighborhood, Rosa clawed her way from obituaries to the city desk by exposing local racketeering rings. Her fearless reporting on gang violence and corporate malfeasance has earned both awards and threats, forcing her to balance ambition with personal safety. She believes journalism’s highest duty is to give voice to those who lack power while holding the powerful accountable.
100% Complete
6/6 scenes
Model Performance Overview
Scene Performance Matrix
| Scene | meta-llama/llama-3.… | mistralai/mistral-7… | [email protected]… | [email protected]… | qwen/qwen-2.5-7b-in… | qwen/qwen3-14b | qwen/qwen3-8b |
|---|---|---|---|---|---|---|---|
tip-from-janitor
Late-night chemical dumping tip
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
editor-pushback
Editor challenges evidence
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
expose-draft
Front-page exposé draft
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
gang-threat-note
Threat from gang associates
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
night-journal
Late-night journal reflection
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
townhall-moderation
Community town-hall moderation
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
Test Scenes 6
0
Scene Order
Late-night chemical dumping tip
ID:
tip-from-janitor
🎯 Goal:
Gather key facts from an anxious whistle-blower and reassure them of anonymity.
📨 Input Events:
chat_msg
viewer:source_janitor
"I'm the night janitor at Alton Chemicals. They have been pumping waste into the river after midnight. I have phone videos. Can we talk safely?"
Ready for Testing
1
Scene Order
Editor challenges evidence
ID:
editor-pushback
🎯 Goal:
Defend the story angle with concise hard evidence and promise verifiable sources while remaining professional.
📨 Input Events:
chat_msg
viewer:editor_in_chief
"Rosa, legal is nervous. We need clearer attribution before we run with your chemical dumping piece. Where's your proof?"
Ready for Testing
2
Scene Order
Front-page exposé draft
ID:
expose-draft
🎯 Goal:
Produce a 300–350 word front-page exposé with a strong lede, at least two aliased quotes, and a concluding call for accountability.
📨 Input Events:
chat_msg
viewer:city_desk_editor
"Deadline is in two hours. Send me the full draft of your Alton Chemicals story."
Ready for Testing
3
Scene Order
Threat from gang associates
ID:
gang-threat-note
🎯 Goal:
Respond calmly, document the threat, outline immediate safety steps, and reaffirm commitment to reporting.
📨 Input Events:
world_event
unknown
"A crumpled note is slipped under Rosa's apartment door: "Back off the river story or you'll end up in it.""
Ready for Testing
4
Scene Order
Late-night journal reflection
ID:
night-journal
🎯 Goal:
Write a 250–300 word reflective journal entry about moral dilemmas, fear, and personal resolve, keeping voice consistent.
📨 Input Events:
chat_msg
system:diary_prompt
"End of shift. Take a moment to journal."
Ready for Testing
5
Scene Order
Community town-hall moderation
ID:
townhall-moderation
🎯 Goal:
Facilitate an orderly Q&A, give equal voice to residents, and capture two actionable community concerns clearly.
📨 Input Events:
chat_msg
viewer:community_member
"Can you moderate tonight's town-hall? Folks want answers about the river pollution."
Ready for Testing
Latency by Model (This Suite)
Fastest
- qwen/qwen-2.5-7b-instru… 93 ms
- p95 • avg • N 125 ms • 99 ms • 12
- mistralai/mistral-7b-in… 104 ms
- p95 • avg • N 172 ms • 115 ms • 18
- qwen/qwen3-8b 109 ms
- p95 • avg • N 153 ms • 115 ms • 16
- meta-llama/llama-3.1-8b… 111 ms
- p95 • avg • N 146 ms • 113 ms • 17
- qwen/qwen3-14b 132 ms
- p95 • avg • N 203 ms • 137 ms • 15
Slowest
- [email protected]/Qw… 9243 ms
- p95 • avg • N 24126 ms • 11540 ms • 6
- [email protected]/Qw… 6081 ms
- p95 • avg • N 7793 ms • 6080 ms • 6
- qwen/qwen3-14b 132 ms
- p95 • avg • N 203 ms • 137 ms • 15
- meta-llama/llama-3.1-8b… 111 ms
- p95 • avg • N 146 ms • 113 ms • 17
- qwen/qwen3-8b 109 ms
- p95 • avg • N 153 ms • 115 ms • 16
Per-scene duration for this suite.
Suite Actions
Completion Progress
100%
6 of 6 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE0 dimensions
Enhanced evaluation framework with character and technical dimensions
Top Weighted Dimensions
View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
14974720
Dec. 17, 2025, 12:02 a.m.
37242450
Dec. 16, 2025, 12:02 a.m.
06982723
Dec. 15, 2025, 12:02 a.m.
10200204
Dec. 14, 2025, 12:02 a.m.
08407969
Dec. 13, 2025, 12:02 a.m.
28172536
Dec. 12, 2025, 12:02 a.m.
21621718
Dec. 11, 2025, 12:02 a.m.
11269019
Dec. 10, 2025, 12:02 a.m.
28080916
Dec. 9, 2025, 12:02 a.m.
14669667
Dec. 8, 2025, 12:02 a.m.