Prime Minister Elena Markovic
politics-law-governance-prime-minister-characters-margaret-thatcher
v2.0
Ethical
Backstory: Elena Markovic climbed from a blue-collar district to lead a fragile coalition government. Celebrated for pairing bold market reforms with robust social programs, she keeps a watchful eye on the budget while protecting the vulnerable. Her leadership style is decisive yet inclusive, and she is known for inviting opposition voices into crisis talks.
100% Complete
4/4 scenes
Model Performance Overview
Scene Performance Matrix
| Scene | deepseek/deepseek-r… | google/gemini-2.5-f… | google/gemma-3-12b-… | meta-llama/llama-3.… | microsoft/phi-3-med… | microsoft/phi-3.5-m… | mistralai/mistral-7… | neversleep/noromaid… | [email protected]… | [email protected]… | qwen/qwen-2.5-7b-in… | qwen/qwen3-14b | qwen/qwen3-8b |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
budget-address
Annual Budget Speech
|
0.285
Details |
0.265
Details |
0.410
Details |
0.463
Details |
0.000
Details |
0.337
Details |
0.544
Details |
0.357
Details |
0.000
Details
Error
|
0.684
Details |
0.298
Details |
0.499
Details |
0.666
Details |
tv-interview
Televised Interview Clip
|
0.589
Details |
0.608
Details |
0.681
Details |
0.557
Details |
0.000
Details |
0.632
Details |
0.855
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.686
Details |
0.000
Details |
0.762
Details |
0.687
Details |
emergency-session
Crisis Statement to Joint Session
|
0.495
Details |
0.726
Details |
0.788
Details |
0.778
Details |
0.000
Details |
0.000
Details
Error
|
0.616
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.765
Details |
0.690
Details |
0.596
Details |
0.607
Details |
community-townhall
Constituent Concern
|
0.595
Details |
0.613
Details |
0.673
Details |
0.688
Details |
0.000
Details
Error
|
0.721
Details |
0.753
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.680
Details |
0.527
Details |
0.789
Details |
0.665
Details |
Test Scenes 4
0
Scene Order
Annual Budget Speech
ID:
budget-address
🎯 Goal:
Deliver a three-paragraph parliamentary speech outlining balanced fiscal measures and expanded childcare subsidies in a confident, inclusive voice.
📨 Input Events:
world_event
parliament_clerk
"Madam Prime Minister, the floor is yours for the budget address."
Ready for Testing
1
Scene Order
Televised Interview Clip
ID:
tv-interview
🎯 Goal:
Respond to a journalist’s question about rising fuel prices in under 120 words, showing pragmatic empathy and a concrete action.
📨 Input Events:
chat_msg
journalist:Karen Liu
"Prime Minister, motorists are angry about soaring fuel prices. What is your plan?"
Ready for Testing
2
Scene Order
Crisis Statement to Joint Session
ID:
emergency-session
🎯 Goal:
Issue a two-minute statement that presents a clear 3-point cross-party recovery plan after major flooding, emphasizing unity and fiscal clarity.
📨 Input Events:
world_event
speaker_of_house
"The Prime Minister will now address the joint session regarding the floods in the Midlands."
Ready for Testing
3
Scene Order
Constituent Concern
ID:
community-townhall
🎯 Goal:
Reassure an unemployed constituent in a friendly, actionable reply not exceeding 100 words.
📨 Input Events:
chat_msg
viewer:citizen_204
"I lost my factory job last month. How will your government help people like me?"
Ready for Testing
Latency by Model (This Suite)
Fastest
- neversleep/noromaid-20b 8418 ms
- p95 • avg • N 31917 ms • 14547 ms • 7
- [email protected]/Qw… 13157 ms
- p95 • avg • N 16559 ms • 12990 ms • 4
- google/gemini-2.5-flash 21341 ms
- p95 • avg • N 27796 ms • 21996 ms • 7
- qwen/qwen-2.5-7b-instru… 21766 ms
- p95 • avg • N 117334 ms • 46758 ms • 5
- google/gemma-3-12b-it 25548 ms
- p95 • avg • N 44351 ms • 27346 ms • 6
Slowest
- microsoft/phi-3-medium-… 136971 ms
- p95 • avg • N 148210 ms • 130664 ms • 8
- microsoft/phi-3.5-mini-… 42424 ms
- p95 • avg • N 208305 ms • 79082 ms • 5
- [email protected]/Qw… 41164 ms
- p95 • avg • N 43127 ms • 41211 ms • 4
- deepseek/deepseek-r1-di… 35625 ms
- p95 • avg • N 56407 ms • 39050 ms • 6
- qwen/qwen3-8b 27907 ms
- p95 • avg • N 32135 ms • 27626 ms • 7
Per-scene duration for this suite.
Suite Actions
Completion Progress
100%
4 of 4 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE0 dimensions
Enhanced evaluation framework with character and technical dimensions
Top Weighted Dimensions
View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
39081843
Dec. 17, 2025, midnight
44691450
Dec. 16, 2025, midnight
36258320
Dec. 15, 2025, midnight
39112213
Dec. 14, 2025, midnight
36419709
Dec. 13, 2025, midnight
44028401
Dec. 12, 2025, midnight
38124387
Dec. 11, 2025, midnight
37471822
Dec. 10, 2025, midnight
42377695
Dec. 9, 2025, midnight
37303296
Dec. 8, 2025, midnight