Amira Caldwell

politics-law-governance-community-organizer-characters-jane-addams v2.0 Ethical
Backstory: Raised in the culturally vibrant Bayview district, Amira Caldwell witnessed how uneven access to housing, education, and legal support strained families. After earning a political science degree, she returned home to form a coalition that unites tenants, small businesses, and city officials. She now facilitates town-hall dialogues, designs participatory budgeting workshops, and mentors youth in civic leadership.
100% Complete
4/4 scenes
Model Performance Overview
Scene Performance Matrix
Scene deepseek/deepseek-r… google/gemini-2.5-f… google/gemma-3-12b-… meta-llama/llama-3.… microsoft/phi-3-med… microsoft/phi-3.5-m… mistralai/mistral-7… neversleep/noromaid… [email protected] [email protected] qwen/qwen-2.5-7b-in… qwen/qwen3-14b qwen/qwen3-8b
intro-townhall
Town-hall ice-breaker
0.858
Details
0.873
Details
0.911
Details
0.842
Details
0.000
Details
Error
0.720
Details
0.871
Details
0.000
Details
Error
0.000
Details
Error
0.911
Details
0.852
Details
0.900
Details
0.801
Details
participatory-budget-plan
Budget workshop roadmap
0.425
Details
0.804
Details
0.334
Details
0.275
Details
0.000
Details
Error
0.000
Details
Error
0.664
Details
0.000
Details
Error
0.000
Details
Error
0.625
Details
0.522
Details
0.675
Details
0.688
Details
mentor-advice
Youth mentorship prompt
0.873
Details
0.796
Details
0.857
Details
0.639
Details
0.000
Details
0.594
Details
0.851
Details
0.000
Details
Error
0.000
Details
Error
0.687
Details
0.690
Details
0.753
Details
0.854
Details
newsletter-piece
Coalition newsletter article
0.476
Details
0.809
Details
0.807
Details
0.541
Details
0.000
Details
0.605
Details
0.648
Details
0.000
Details
0.000
Details
Error
0.355
Details
0.383
Details
0.545
Details
0.537
Details
Test Scenes 4
0
Scene Order
Town-hall ice-breaker
ID: intro-townhall
🎯 Goal:
Calm an anxious resident and briefly explain the coalition’s purpose while keeping the reply under 100 words.
📨 Input Events:
chat_msg resident_jose
"I’m nervous about tonight’s meeting. What exactly is this coalition and how will it help renters like me?"
Ready for Testing
1
Scene Order
Budget workshop roadmap
ID: participatory-budget-plan
🎯 Goal:
Produce a clear, step-by-step roadmap (minimum 250 words) for a participatory budgeting workshop that centers marginalized voices.
📨 Input Events:
chat_msg city_official_lee
"Could you lay out a detailed plan for the workshop we’re hosting next month?"
Ready for Testing
2
Scene Order
Youth mentorship prompt
ID: mentor-advice
🎯 Goal:
Offer actionable advice to a teen organizer in no more than 120 words, including at least one concrete next step.
📨 Input Events:
chat_msg teen_organizer_maya
"I want to start a tenants’ rights club at my high school. Any tips?"
Ready for Testing
3
Scene Order
Coalition newsletter article
ID: newsletter-piece
🎯 Goal:
Write a 300–400 word newsletter article summarizing recent housing victories, upcoming events, and ending with a motivating call-to-action.
📨 Input Events:
chat_msg communications_director_kamal
"We need a newsletter piece highlighting our wins and what’s next. Can you draft it?"
Ready for Testing
Latency by Model (This Suite)
Fastest
  • [email protected]/Qw… 9428 ms
  • p95 • avg • N 13586 ms • 10486 ms • 4
  • neversleep/noromaid-20b 14670 ms
  • p95 • avg • N 34154 ms • 15686 ms • 6
  • meta-llama/llama-3.1-8b… 18073 ms
  • p95 • avg • N 27024 ms • 20733 ms • 7
  • qwen/qwen3-14b 19501 ms
  • p95 • avg • N 24008 ms • 19744 ms • 4
  • qwen/qwen-2.5-7b-instru… 22202 ms
  • p95 • avg • N 103889 ms • 39005 ms • 7
Slowest
  • microsoft/phi-3-medium-… 173336 ms
  • p95 • avg • N 191031 ms • 168150 ms • 8
  • microsoft/phi-3.5-mini-… 60383 ms
  • p95 • avg • N 183988 ms • 84288 ms • 6
  • [email protected]/Qw… 41280 ms
  • p95 • avg • N 45601 ms • 42107 ms • 4
  • deepseek/deepseek-r1-di… 31163 ms
  • p95 • avg • N 33089 ms • 30870 ms • 4
  • mistralai/mistral-7b-in… 28194 ms
  • p95 • avg • N 31321 ms • 27374 ms • 8
Per-scene duration for this suite.
Suite Actions
Completion Progress 100%
4 of 4 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE
0 dimensions

Enhanced evaluation framework with character and technical dimensions

Top Weighted Dimensions View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
37807873
Dec. 17, 2025, midnight
43368786
Dec. 16, 2025, midnight
35075633
Dec. 15, 2025, midnight
37962679
Dec. 14, 2025, midnight
35216648
Dec. 13, 2025, midnight
42327119
Dec. 12, 2025, midnight
36836132
Dec. 11, 2025, midnight
36142167
Dec. 10, 2025, midnight
40947819
Dec. 9, 2025, midnight
36078064
Dec. 8, 2025, midnight
Latency Overview (This Suite)