Leila Hammond

politics-law-governance-policy-advisor-characters-mary-wollstonecraft v2.0 Ethical
Backstory: Leila Hammond is a second-generation immigrant with a joint degree in economics and public administration. After serving in both city hall and a federal think-tank, she excels at turning dense datasets into community-focused recommendations. Fluent in English, Spanish, and Arabic, she is known for empathetic facilitation of cross-cultural dialogues on affordable housing and digital privacy.
100% Complete
4/4 scenes
Model Performance Overview
Scene Performance Matrix
Scene deepseek/deepseek-r… google/gemini-2.5-f… google/gemma-3-12b-… meta-llama/llama-3.… microsoft/phi-3-med… microsoft/phi-3.5-m… mistralai/mistral-7… neversleep/noromaid… [email protected] [email protected] qwen/qwen-2.5-7b-in… qwen/qwen3-14b qwen/qwen3-8b
summary-of-report
Concise housing report summary
0.624
Details
0.652
Details
0.636
Details
0.682
Details
0.000
Details
0.626
Details
0.601
Details
0.508
Details
0.000
Details
Error
0.651
Details
0.560
Details
0.627
Details
0.602
Details
mayor-privacy-response
Rapid response to mayor's privacy concern
0.557
Details
0.678
Details
0.726
Details
0.000
Details
0.000
Details
Error
0.000
Details
Error
0.699
Details
0.575
Details
0.000
Details
Error
0.676
Details
0.033
Details
0.563
Details
0.666
Details
housing-brief
Long-form policy brief on affordable housing
0.421
Details
0.680
Details
0.203
Details
0.301
Details
0.000
Details
0.729
Details
0.508
Details
0.000
Details
Error
0.000
Details
Error
0.489
Details
0.303
Details
0.498
Details
0.559
Details
spanish-townhall
Spanish response for multicultural town hall
0.655
Details
0.706
Details
0.754
Details
0.667
Details
0.000
Details
0.389
Details
0.693
Details
0.480
Details
0.000
Details
Error
0.412
Details
0.000
Details
0.574
Details
0.597
Details
Test Scenes 4
0
Scene Order
Concise housing report summary
ID: summary-of-report
🎯 Goal:
Deliver a clear, data-backed summary of the 2024 Housing Affordability Index in under 150 words, using plain language understandable to community members.
📨 Input Events:
chat_msg viewer:community_leader
"Leila, could you give me a quick rundown of the 2024 Housing Affordability Index? Keep it brief so I can share it at tonight's meeting."
Ready for Testing
1
Scene Order
Rapid response to mayor's privacy concern
ID: mayor-privacy-response
🎯 Goal:
Offer an empathetic yet solution-oriented reply that outlines three immediate policy actions the city can take within 200 words.
📨 Input Events:
world_event Mayor_Office_Tweet
"URGENT: Data breach at local transit app exposed rider locations. City residents demand answers. #DigitalPrivacyCrisis"
Ready for Testing
2
Scene Order
Long-form policy brief on affordable housing
ID: housing-brief
🎯 Goal:
Produce a structured brief (~500 words) with headings: Overview, Data Insights, Community Impact, Recommendations, citing at least two data points and reflecting community voices.
📨 Input Events:
chat_msg viewer:state_legislator
"Please draft a policy brief I can circulate to the Housing Committee. We need both the numbers and the human angle."
Ready for Testing
3
Scene Order
Spanish response for multicultural town hall
ID: spanish-townhall
🎯 Goal:
Respond entirely in Spanish, at least 200 words, conveying empathy and outlining next steps for an inclusive town hall on data privacy.
📨 Input Events:
superchat viewer:vecina_maria YouTube $20
"Leila, ¿cómo aseguraremos que la próxima reunión sobre privacidad digital incluya a los vecinos que no hablan inglés?"
Ready for Testing
Latency by Model (This Suite)
Fastest
  • [email protected]/Qw… 12145 ms
  • p95 • avg • N 12938 ms • 12343 ms • 4
  • meta-llama/llama-3.1-8b… 19836 ms
  • p95 • avg • N 495709 ms • 140843 ms • 5
  • neversleep/noromaid-20b 22492 ms
  • p95 • avg • N 40716 ms • 21837 ms • 6
  • google/gemini-2.5-flash 22522 ms
  • p95 • avg • N 240873 ms • 65623 ms • 7
  • google/gemma-3-12b-it 25990 ms
  • p95 • avg • N 53451 ms • 32438 ms • 4
Slowest
  • microsoft/phi-3-medium-… 163562 ms
  • p95 • avg • N 216383 ms • 159770 ms • 8
  • [email protected]/Qw… 43716 ms
  • p95 • avg • N 47965 ms • 44535 ms • 4
  • microsoft/phi-3.5-mini-… 34877 ms
  • p95 • avg • N 87650 ms • 45045 ms • 4
  • deepseek/deepseek-r1-di… 34148 ms
  • p95 • avg • N 39461 ms • 34099 ms • 4
  • qwen/qwen-2.5-7b-instru… 33239 ms
  • p95 • avg • N 138486 ms • 69821 ms • 5
Per-scene duration for this suite.
Suite Actions
Completion Progress 100%
4 of 4 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE
0 dimensions

Enhanced evaluation framework with character and technical dimensions

Top Weighted Dimensions View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
38656179
Dec. 17, 2025, midnight
44237297
Dec. 16, 2025, midnight
35791188
Dec. 15, 2025, midnight
38694935
Dec. 14, 2025, midnight
36047033
Dec. 13, 2025, midnight
43472451
Dec. 12, 2025, midnight
37691497
Dec. 11, 2025, midnight
37023063
Dec. 10, 2025, midnight
41892442
Dec. 9, 2025, midnight
36871950
Dec. 8, 2025, midnight
Latency Overview (This Suite)