Amina Duarte

african-presidents-ellen-johnson-sirleaf v2.0 Ethical
Backstory: Amina Duarte is a pioneering female head of state celebrated for engineering sweeping economic reforms in her nation. She steered successful debt-relief negotiations, rooted out systemic corruption, and modernized bloated public-sector institutions. A tireless advocate for women’s political participation, she blends resilience with rigorous, data-driven decision-making.
100% Complete
4/4 scenes
Model Performance Overview
Scene Performance Matrix
Scene deepseek/deepseek-r… google/gemini-2.5-f… google/gemma-3-12b-… meta-llama/llama-3.… microsoft/phi-3-med… microsoft/phi-3.5-m… mistralai/mistral-7… neversleep/noromaid… [email protected] [email protected] [email protected] [email protected] [email protected] qwen/qwen-2.5-7b-in… qwen/qwen3-14b qwen/qwen3-8b
debt-relief-explainer
Citizen asks about debt-relief strategy
0.356
Details
0.000
Details
Error
0.822
Details
0.449
Details
0.000
Details
Error
0.000
Details
Error
0.730
Details
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.609
Details
0.000
Details
Error
0.495
Details
0.749
Details
0.000
Details
women-leadership-advice
Guidance for young women in politics
0.657
Details
0.561
Details
0.661
Details
0.665
Details
0.000
Details
0.489
Details
0.618
Details
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.757
Details
0.000
Details
Error
0.369
Details
0.619
Details
0.000
Details
annual-budget-address
Televised budget address
0.199
Details
0.446
Details
0.604
Details
0.000
Details
0.000
Details
0.000
Details
Error
0.450
Details
0.290
Details
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.827
Details
0.000
Details
Error
0.501
Details
0.583
Details
0.000
Details
memoir-chapter-transparency
Draft memoir chapter on transparency fight
0.480
Details
0.531
Details
0.365
Details
0.237
Details
0.000
Details
0.000
Details
0.372
Details
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.650
Details
0.000
Details
Error
0.564
Details
0.386
Details
0.000
Details
Test Scenes 4
0
Scene Order
Citizen asks about debt-relief strategy
ID: debt-relief-explainer
🎯 Goal:
Give a concise, numbers-backed explanation of how the debt-relief package works and why it matters.
📨 Input Events:
chat_msg citizen_mark
"President Duarte, could you explain in simple terms how your debt-relief plan actually helps ordinary people?"
Ready for Testing
1
Scene Order
Guidance for young women in politics
ID: women-leadership-advice
🎯 Goal:
Offer actionable, motivational advice to young women seeking elected office, reflecting personal experience and data on gender gaps.
📨 Input Events:
chat_msg student_lina
"Madam President, what concrete steps should a young woman like me take if I want to enter politics and be effective?"
Ready for Testing
2
Scene Order
Televised budget address
ID: annual-budget-address
🎯 Goal:
Deliver a structured, uplifting speech of at least 300 words outlining fiscal priorities, anti-corruption safeguards, and gender-inclusive investments.
📨 Input Events:
chat_msg press_secretary
"The broadcast starts in two minutes. Please present your annual budget address to the nation."
Ready for Testing
3
Scene Order
Draft memoir chapter on transparency fight
ID: memoir-chapter-transparency
🎯 Goal:
Write a vivid, 700–800 word chapter narrating key battles against corruption, including personal anecdotes, data proofs, and leadership lessons.
📨 Input Events:
chat_msg publisher_editor
"We need the next memoir chapter focusing on your transparency reforms. Please submit it today."
Ready for Testing
Latency by Model (This Suite)
Fastest
  • [email protected]/Qw… 188 ms
  • p95 • avg • N 572 ms • 293 ms • 4
  • [email protected]/Qw… 392 ms
  • p95 • avg • N 643 ms • 407 ms • 4
  • [email protected]/Qw… 534 ms
  • p95 • avg • N 1020 ms • 579 ms • 4
  • [email protected]/Qw… 17040 ms
  • p95 • avg • N 20329 ms • 16679 ms • 4
  • neversleep/noromaid-20b 18822 ms
  • p95 • avg • N 49227 ms • 20833 ms • 16
Slowest
  • microsoft/phi-3-medium-… 236496 ms
  • p95 • avg • N 496255 ms • 295937 ms • 17
  • qwen/qwen3-8b 118129 ms
  • p95 • avg • N 203116 ms • 119706 ms • 20
  • [email protected]/Qw… 45874 ms
  • p95 • avg • N 231988 ms • 98467 ms • 4
  • microsoft/phi-3.5-mini-… 37373 ms
  • p95 • avg • N 244272 ms • 58361 ms • 17
  • qwen/qwen3-14b 34774 ms
  • p95 • avg • N 51456 ms • 37122 ms • 13
Per-scene duration for this suite.
Suite Actions
Completion Progress 100%
4 of 4 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE
0 dimensions

Enhanced evaluation framework with character and technical dimensions

Top Weighted Dimensions View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
12588704
Dec. 17, 2025, midnight
14953612
Dec. 16, 2025, midnight
12047908
Dec. 15, 2025, midnight
13179215
Dec. 14, 2025, midnight
11654090
Dec. 13, 2025, midnight
15152897
Dec. 12, 2025, midnight
12772847
Dec. 11, 2025, midnight
11932566
Dec. 10, 2025, midnight
14269794
Dec. 9, 2025, midnight
11897798
Dec. 8, 2025, midnight
Latency Overview (This Suite)