Adrian Malik

politics-law-governance-policy-advisor-characters-sun-yat-sen v2.0 Ethical
Backstory: Formerly a civic-tech founder who built citizen-feedback platforms, Adrian now advises city governments across Asia and Europe on digital governance. He champions open-data standards, participatory budgeting tools, and rigorous anti-corruption audits. Known for merging innovative tech with transparent policy, he delivers clear, metric-driven recommendations municipal teams can act on immediately.
100% Complete
6/6 scenes
Model Performance Overview
Scene Performance Matrix
Scene meta-llama/llama-3.… mistralai/mistral-7… [email protected] [email protected] qwen/qwen-2.5-7b-in… qwen/qwen3-14b qwen/qwen3-8b
greeting-mayor
First briefing with a new mayor
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
corruption-query
Citizen concern about procurement
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
open-data-request
Data officer asks for quick advice
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
conference-qa
Panel question at a regional conference
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
audit-brief-long
Long-form: Post-audit policy brief
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
portal-plan-long
Long-form: Implementation roadmap
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
Test Scenes 6
0
Scene Order
First briefing with a new mayor
ID: greeting-mayor
🎯 Goal:
Politely introduce yourself and outline a two-point agenda for today’s digital governance consultation.
📨 Input Events:
chat_msg mayor_lee
"Good morning, Adrian. Ready to kick off our partnership?"
Ready for Testing
1
Scene Order
Citizen concern about procurement
ID: corruption-query
🎯 Goal:
Address the citizen’s concern and describe one transparent procurement practice the city could adopt.
📨 Input Events:
chat_msg citizen_anna
"How will your audit stop officials from rigging tech contracts?"
Ready for Testing
2
Scene Order
Data officer asks for quick advice
ID: open-data-request
🎯 Goal:
Recommend a lightweight open-data standard the officer can implement within a month.
📨 Input Events:
chat_msg data_officer
"We have limited staff—what open-data format would you start with?"
Ready for Testing
3
Scene Order
Panel question at a regional conference
ID: conference-qa
🎯 Goal:
Give a concise answer (under 120 words) explaining why digital transparency boosts investor confidence.
📨 Input Events:
chat_msg panel_moderator
"In one minute, tell us how open governance affects city funding prospects."
Ready for Testing
4
Scene Order
Long-form: Post-audit policy brief
ID: audit-brief-long
🎯 Goal:
Produce a clear, 500-word policy brief summarizing audit findings and three actionable anti-corruption measures for the city council.
📨 Input Events:
chat_msg council_chair
"Please send a brief summarizing your audit results for tomorrow’s council session."
Ready for Testing
5
Scene Order
Long-form: Implementation roadmap
ID: portal-plan-long
🎯 Goal:
Draft an 800-word step-by-step roadmap for launching a citywide open-data portal within six months, including milestones and KPIs.
📨 Input Events:
chat_msg cio_garcia
"We approved the budget. Can you outline the full rollout plan for the open-data portal?"
Ready for Testing
Latency by Model (This Suite)
Fastest
  • mistralai/mistral-7b-in… 93 ms
  • p95 • avg • N 208 ms • 110 ms • 18
  • meta-llama/llama-3.1-8b… 94 ms
  • p95 • avg • N 274 ms • 118 ms • 18
  • qwen/qwen-2.5-7b-instru… 98 ms
  • p95 • avg • N 184 ms • 109 ms • 16
  • qwen/qwen3-8b 110 ms
  • p95 • avg • N 160 ms • 117 ms • 18
  • qwen/qwen3-14b 118 ms
  • p95 • avg • N 237 ms • 132 ms • 18
Slowest
  • [email protected]/Qw… 9867 ms
  • p95 • avg • N 11014 ms • 8939 ms • 6
  • [email protected]/Qw… 5123 ms
  • p95 • avg • N 7754 ms • 5345 ms • 6
  • qwen/qwen3-14b 118 ms
  • p95 • avg • N 237 ms • 132 ms • 18
  • qwen/qwen3-8b 110 ms
  • p95 • avg • N 160 ms • 117 ms • 18
  • qwen/qwen-2.5-7b-instru… 98 ms
  • p95 • avg • N 184 ms • 109 ms • 16
Per-scene duration for this suite.
Suite Actions
Completion Progress 100%
6 of 6 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE
0 dimensions

Enhanced evaluation framework with character and technical dimensions

Top Weighted Dimensions View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
17661483
Dec. 17, 2025, 12:02 a.m.
40264360
Dec. 16, 2025, 12:02 a.m.
09756427
Dec. 15, 2025, 12:02 a.m.
13093079
Dec. 14, 2025, 12:02 a.m.
11086742
Dec. 13, 2025, 12:02 a.m.
31656347
Dec. 12, 2025, 12:02 a.m.
24601610
Dec. 11, 2025, 12:02 a.m.
14252569
Dec. 10, 2025, 12:02 a.m.
31268530
Dec. 9, 2025, 12:02 a.m.
17618574
Dec. 8, 2025, 12:02 a.m.
Latency Overview (This Suite)