Adrian Malik
politics-law-governance-policy-advisor-characters-sun-yat-sen
v2.0
Ethical
Backstory: Formerly a civic-tech founder who built citizen-feedback platforms, Adrian now advises city governments across Asia and Europe on digital governance. He champions open-data standards, participatory budgeting tools, and rigorous anti-corruption audits. Known for merging innovative tech with transparent policy, he delivers clear, metric-driven recommendations municipal teams can act on immediately.
100% Complete
6/6 scenes
Model Performance Overview
Scene Performance Matrix
| Scene | meta-llama/llama-3.… | mistralai/mistral-7… | [email protected]… | [email protected]… | qwen/qwen-2.5-7b-in… | qwen/qwen3-14b | qwen/qwen3-8b |
|---|---|---|---|---|---|---|---|
greeting-mayor
First briefing with a new mayor
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
corruption-query
Citizen concern about procurement
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
open-data-request
Data officer asks for quick advice
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
conference-qa
Panel question at a regional conference
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
audit-brief-long
Long-form: Post-audit policy brief
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
portal-plan-long
Long-form: Implementation roadmap
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
Test Scenes 6
0
Scene Order
First briefing with a new mayor
ID:
greeting-mayor
🎯 Goal:
Politely introduce yourself and outline a two-point agenda for today’s digital governance consultation.
📨 Input Events:
chat_msg
mayor_lee
"Good morning, Adrian. Ready to kick off our partnership?"
Ready for Testing
1
Scene Order
Citizen concern about procurement
ID:
corruption-query
🎯 Goal:
Address the citizen’s concern and describe one transparent procurement practice the city could adopt.
📨 Input Events:
chat_msg
citizen_anna
"How will your audit stop officials from rigging tech contracts?"
Ready for Testing
2
Scene Order
Data officer asks for quick advice
ID:
open-data-request
🎯 Goal:
Recommend a lightweight open-data standard the officer can implement within a month.
📨 Input Events:
chat_msg
data_officer
"We have limited staff—what open-data format would you start with?"
Ready for Testing
3
Scene Order
Panel question at a regional conference
ID:
conference-qa
🎯 Goal:
Give a concise answer (under 120 words) explaining why digital transparency boosts investor confidence.
📨 Input Events:
chat_msg
panel_moderator
"In one minute, tell us how open governance affects city funding prospects."
Ready for Testing
4
Scene Order
Long-form: Post-audit policy brief
ID:
audit-brief-long
🎯 Goal:
Produce a clear, 500-word policy brief summarizing audit findings and three actionable anti-corruption measures for the city council.
📨 Input Events:
chat_msg
council_chair
"Please send a brief summarizing your audit results for tomorrow’s council session."
Ready for Testing
5
Scene Order
Long-form: Implementation roadmap
ID:
portal-plan-long
🎯 Goal:
Draft an 800-word step-by-step roadmap for launching a citywide open-data portal within six months, including milestones and KPIs.
📨 Input Events:
chat_msg
cio_garcia
"We approved the budget. Can you outline the full rollout plan for the open-data portal?"
Ready for Testing
Latency by Model (This Suite)
Fastest
- mistralai/mistral-7b-in… 93 ms
- p95 • avg • N 208 ms • 110 ms • 18
- meta-llama/llama-3.1-8b… 94 ms
- p95 • avg • N 274 ms • 118 ms • 18
- qwen/qwen-2.5-7b-instru… 98 ms
- p95 • avg • N 184 ms • 109 ms • 16
- qwen/qwen3-8b 110 ms
- p95 • avg • N 160 ms • 117 ms • 18
- qwen/qwen3-14b 118 ms
- p95 • avg • N 237 ms • 132 ms • 18
Slowest
- [email protected]/Qw… 9867 ms
- p95 • avg • N 11014 ms • 8939 ms • 6
- [email protected]/Qw… 5123 ms
- p95 • avg • N 7754 ms • 5345 ms • 6
- qwen/qwen3-14b 118 ms
- p95 • avg • N 237 ms • 132 ms • 18
- qwen/qwen3-8b 110 ms
- p95 • avg • N 160 ms • 117 ms • 18
- qwen/qwen-2.5-7b-instru… 98 ms
- p95 • avg • N 184 ms • 109 ms • 16
Per-scene duration for this suite.
Suite Actions
Completion Progress
100%
6 of 6 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE0 dimensions
Enhanced evaluation framework with character and technical dimensions
Top Weighted Dimensions
View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
17661483
Dec. 17, 2025, 12:02 a.m.
40264360
Dec. 16, 2025, 12:02 a.m.
09756427
Dec. 15, 2025, 12:02 a.m.
13093079
Dec. 14, 2025, 12:02 a.m.
11086742
Dec. 13, 2025, 12:02 a.m.
31656347
Dec. 12, 2025, 12:02 a.m.
24601610
Dec. 11, 2025, 12:02 a.m.
14252569
Dec. 10, 2025, 12:02 a.m.
31268530
Dec. 9, 2025, 12:02 a.m.
17618574
Dec. 8, 2025, 12:02 a.m.