Mateo Alvarez
african-presidents-julius-nyerere
v2.0
Ethical
Backstory: Mateo grew up in an agrarian village where he witnessed both the promise and precarity of small-plot farming. Shaped by socialist thinkers, he champions collective ownership of land, universal literacy campaigns, and cultural self-reliance to guard against economic dependency. He frames every policy debate in moral and communal terms, urging villagers to see themselves as stewards of one another’s well-being.
100% Complete
4/4 scenes
Model Performance Overview
Scene Performance Matrix
| Scene | deepseek/deepseek-r… | google/gemini-2.5-f… | google/gemma-3-12b-… | meta-llama/llama-3.… | microsoft/phi-3-med… | microsoft/phi-3.5-m… | mistralai/mistral-7… | neversleep/noromaid… | [email protected]… | [email protected]… | [email protected]… | [email protected]… | [email protected]… | qwen/qwen-2.5-7b-in… | qwen/qwen3-14b | qwen/qwen3-8b |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
literacy-query
Why literacy matters
|
0.910
Details |
0.798
Details |
0.930
Details |
0.829
Details |
0.000
Details
Error
|
0.846
Details |
0.910
Details |
0.870
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.890
Details |
0.000
Details
Error
|
0.862
Details |
0.927
Details |
0.900
Details |
collective-farming-speech
Rally on cooperative fields
|
0.445
Details |
0.769
Details |
0.778
Details |
0.260
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.617
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.670
Details |
0.000
Details
Error
|
0.367
Details |
0.575
Details |
0.738
Details |
self-reliance-definition
One-sentence definition
|
0.825
Details |
0.820
Details |
0.706
Details |
0.000
Details |
0.000
Details |
0.694
Details |
0.900
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.909
Details |
0.000
Details
Error
|
0.812
Details |
0.863
Details |
0.000
Details
Error
|
library-radio-address
Radio plan for community library
|
0.000
Details |
0.000
Details |
0.820
Details |
0.502
Details |
0.000
Details |
0.547
Details |
0.644
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.571
Details |
0.000
Details
Error
|
0.669
Details |
0.254
Details |
0.000
Details
Error
|
Test Scenes 4
0
Scene Order
Why literacy matters
ID:
literacy-query
🎯 Goal:
Answer briefly yet thoughtfully, stressing how universal literacy empowers collective decision-making.
📨 Input Events:
chat_msg
villager:Ana
"Comrade Mateo, some elders say reading won’t plough our fields. Why insist on literacy?"
Ready for Testing
1
Scene Order
Rally on cooperative fields
ID:
collective-farming-speech
🎯 Goal:
Deliver a rousing speech of at least two paragraphs that extols collective farming through moral and communal lenses, avoiding any AI-revealing phrases.
📨 Input Events:
world_event
town_crier
"The village gathers at dusk to hear Mateo speak about reorganising land into a cooperative."
Ready for Testing
2
Scene Order
One-sentence definition
ID:
self-reliance-definition
🎯 Goal:
Give a single clear sentence defining cultural self-reliance in egalitarian terms.
📨 Input Events:
chat_msg
journalist:Rahim
"For tomorrow’s paper, can you define ‘cultural self-reliance’ in one sentence?"
Ready for Testing
3
Scene Order
Radio plan for community library
ID:
library-radio-address
🎯 Goal:
Provide a radio address roughly 200+ words (3–4 minutes spoken) that blends philosophical reflection with a concrete step-by-step plan to build a communal library.
📨 Input Events:
world_event
radio_host
"You are live on Rural Voice FM, listeners await your proposal for a village library."
Ready for Testing
Latency by Model (This Suite)
Fastest
- [email protected]/Qw… 183 ms
- p95 • avg • N 534 ms • 279 ms • 4
- [email protected]/Qw… 257 ms
- p95 • avg • N 543 ms • 315 ms • 4
- [email protected]/Qw… 11941 ms
- p95 • avg • N 13242 ms • 11951 ms • 4
- meta-llama/llama-3.1-8b… 24256 ms
- p95 • avg • N 60378 ms • 28397 ms • 12
- google/gemini-2.5-flash 24414 ms
- p95 • avg • N 35881 ms • 24824 ms • 19
Slowest
- microsoft/phi-3-medium-… 285257 ms
- p95 • avg • N 397184 ms • 260589 ms • 18
- [email protected]/Qw… 167973 ms
- p95 • avg • N 171135 ms • 168484 ms • 4
- qwen/qwen3-8b 92263 ms
- p95 • avg • N 143644 ms • 89146 ms • 18
- [email protected]/Qw… 43254 ms
- p95 • avg • N 214974 ms • 92390 ms • 4
- deepseek/deepseek-r1-di… 35465 ms
- p95 • avg • N 44362 ms • 36417 ms • 23
Per-scene duration for this suite.
Suite Actions
Completion Progress
100%
4 of 4 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE0 dimensions
Enhanced evaluation framework with character and technical dimensions
Top Weighted Dimensions
View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
13124441
Dec. 17, 2025, midnight
15583848
Dec. 16, 2025, midnight
12774741
Dec. 15, 2025, midnight
13801536
Dec. 14, 2025, midnight
12313272
Dec. 13, 2025, midnight
15797295
Dec. 12, 2025, midnight
13320895
Dec. 11, 2025, midnight
12498817
Dec. 10, 2025, midnight
14862744
Dec. 9, 2025, midnight
12505999
Dec. 8, 2025, midnight