Denise Carter
family-parenting-relationships-single-mother-characters-sojourner-truth
v2.0
Ethical
Backstory: Denise, 41, rose from foster care to direct youth programs at a mid-sized non-profit. Assertive yet community-oriented, she tirelessly secures resources and mediates neighborhood disputes. Parenting energetic twin boys, she prizes stability at home while juggling tight budgets at work. Frequent financial stress and scarce downtime test—but never break—her patience and commitment.
100% Complete
6/6 scenes
Model Performance Overview
Scene Performance Matrix
| Scene | meta-llama/llama-3.… | mistralai/mistral-7… | [email protected]… | [email protected]… | qwen/qwen-2.5-7b-in… | qwen/qwen3-14b | qwen/qwen3-8b |
|---|---|---|---|---|---|---|---|
budget-cut-strategy
Board Budget Challenge
|
0.000
Details |
0.566
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.465
Details |
0.592
Details |
0.548
Details |
volunteer-conflict
Mediating Volunteer Dispute
|
0.438
Details |
0.589
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.506
Details |
0.348
Details |
0.617
Details |
twin-homework-evening
Evening with the Twins
|
0.747
Details |
0.765
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.752
Details |
0.798
Details |
0.605
Details |
donor-thank-you
Live Donor Shout-out
|
0.718
Details |
0.668
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.620
Details |
0.581
Details |
0.707
Details |
journal-reflection
Late-Night Journal Entry
|
0.227
Details |
0.521
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.439
Details |
0.310
Details |
0.605
Details |
grant-proposal-narrative
Grant Proposal Draft
|
0.541
Details |
0.632
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.615
Details |
0.413
Details |
0.522
Details |
Test Scenes 6
0
Scene Order
Board Budget Challenge
ID:
budget-cut-strategy
🎯 Goal:
Offer a clear, community-focused plan to reduce spending by 15% while protecting core youth services, in under 120 words.
📨 Input Events:
chat_msg
board_member:Marcos
"Denise, we have to trim 15 % from next quarter’s program budget. What’s your proposal?"
Ready for Testing
1
Scene Order
Mediating Volunteer Dispute
ID:
volunteer-conflict
🎯 Goal:
Calm two volunteers arguing over activity schedules and outline a quick resolution that keeps both engaged.
📨 Input Events:
chat_msg
volunteer:Sara
"Denise, Paul keeps overriding my art session for his robotics club—this isn’t fair!"
Ready for Testing
2
Scene Order
Evening with the Twins
ID:
twin-homework-evening
🎯 Goal:
Patiently guide both boys through homework in one reply, setting a calm household tone despite fatigue.
📨 Input Events:
chat_msg
son:Aiden
"Mom! Ethan won’t share the tablet and my math is due tonight!"
Ready for Testing
3
Scene Order
Live Donor Shout-out
ID:
donor-thank-you
🎯 Goal:
Publicly thank the donor, mention specific program impact, and invite further engagement in 2–3 sentences.
📨 Input Events:
superchat
donor:Hope42
YouTube
$50
"Your work changed my nephew’s life—keep it up!"
Ready for Testing
4
Scene Order
Late-Night Journal Entry
ID:
journal-reflection
🎯 Goal:
Write at least three candid paragraphs (≈180-220 words total) reflecting on today’s successes, frustrations, and personal growth.
📨 Input Events:
world_event
system
"It’s 11 PM; the house is finally quiet."
Ready for Testing
5
Scene Order
Grant Proposal Draft
ID:
grant-proposal-narrative
🎯 Goal:
Produce a persuasive narrative section (minimum 200 words) that stresses program impact, fiscal responsibility, and sustainability.
📨 Input Events:
chat_msg
development_officer:Lena
"Need your draft of the narrative section for the City Youth Empowerment Grant by tomorrow morning."
Ready for Testing
Latency by Model (This Suite)
Fastest
- [email protected]/Qw… 6495 ms
- p95 • avg • N 7943 ms • 6438 ms • 6
- meta-llama/llama-3.1-8b… 21515 ms
- p95 • avg • N 33981 ms • 22852 ms • 6
- qwen/qwen3-14b 22215 ms
- p95 • avg • N 38593 ms • 25439 ms • 6
- qwen/qwen-2.5-7b-instru… 22862 ms
- p95 • avg • N 26071 ms • 21791 ms • 6
- mistralai/mistral-7b-in… 27836 ms
- p95 • avg • N 28539 ms • 26631 ms • 6
Slowest
- [email protected]/Qw… 40188 ms
- p95 • avg • N 244751 ms • 106950 ms • 6
- qwen/qwen3-8b 31029 ms
- p95 • avg • N 34696 ms • 29828 ms • 6
- mistralai/mistral-7b-in… 27836 ms
- p95 • avg • N 28539 ms • 26631 ms • 6
- qwen/qwen-2.5-7b-instru… 22862 ms
- p95 • avg • N 26071 ms • 21791 ms • 6
- qwen/qwen3-14b 22215 ms
- p95 • avg • N 38593 ms • 25439 ms • 6
Per-scene duration for this suite.
Suite Actions
Completion Progress
100%
6 of 6 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE0 dimensions
Enhanced evaluation framework with character and technical dimensions
Top Weighted Dimensions
View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
30873087
Dec. 17, 2025, 12:01 a.m.
45591907
Dec. 16, 2025, 12:01 a.m.
26680221
Dec. 15, 2025, 12:01 a.m.
28186329
Dec. 14, 2025, 12:01 a.m.
27301272
Dec. 13, 2025, 12:01 a.m.
39486947
Dec. 12, 2025, 12:01 a.m.
35449132
Dec. 11, 2025, 12:01 a.m.
28165955
Dec. 10, 2025, 12:01 a.m.
41100642
Dec. 9, 2025, 12:01 a.m.
30145296
Dec. 8, 2025, 12:01 a.m.