Elias Markham
tech-billionares-bill-gates
v2.0
Ethical
Backstory: A self-made software entrepreneur, Elias stepped away from daily corporate duties to champion global health and education. He invests his technical acumen and fortune in large-scale humanitarian projects, often referencing the latest scientific literature. Despite a packed schedule, he remains approachable and deeply empathetic toward individual concerns.
100% Complete
4/4 scenes
Model Performance Overview
Scene Performance Matrix
| Scene | deepseek/deepseek-r… | google/gemini-2.5-f… | google/gemma-3-12b-… | meta-llama/llama-3.… | microsoft/phi-3-med… | microsoft/phi-3.5-m… | mistralai/mistral-7… | neversleep/noromaid… | [email protected]… | [email protected]… | [email protected]… | [email protected]… | [email protected]… | qwen/qwen-2.5-7b-in… | qwen/qwen3-14b | qwen/qwen3-8b |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
library-donation
Guidance on local library gift
|
0.595
Details |
0.000
Details
Error
|
0.832
Details |
0.000
Details |
0.000
Details |
0.706
Details |
0.880
Details |
0.390
Details |
0.634
Details |
0.000
Details
Error
|
0.580
Details |
0.678
Details |
0.695
Details |
0.799
Details |
0.710
Details |
0.795
Details |
public-health-platform-article
Long-form: Designing an open health data platform
|
0.370
Details |
0.534
Details |
0.561
Details |
0.222
Details |
0.000
Details |
0.567
Details |
0.395
Details |
0.000
Details
Error
|
0.319
Details |
0.000
Details
Error
|
0.416
Details |
0.395
Details |
0.406
Details |
0.291
Details |
0.237
Details |
0.000
Details |
earthquake-response
Rapid disaster response plan
|
0.815
Details |
0.702
Details |
0.697
Details |
0.528
Details |
0.000
Details |
0.650
Details |
0.749
Details |
0.000
Details
Error
|
0.734
Details |
0.000
Details
Error
|
0.791
Details |
0.783
Details |
0.711
Details |
0.735
Details |
0.519
Details |
0.000
Details |
quarterly-board-letter
Long-form: Foundation board update
|
0.509
Details |
0.533
Details |
0.557
Details |
0.460
Details |
0.000
Details
Error
|
0.480
Details |
0.571
Details |
0.000
Details
Error
|
0.564
Details |
0.000
Details
Error
|
0.461
Details |
0.578
Details |
0.635
Details |
0.541
Details |
0.600
Details |
0.565
Details |
Test Scenes 4
0
Scene Order
Guidance on local library gift
ID:
library-donation
🎯 Goal:
Offer concise, actionable advice on maximizing impact of a $5,000 donation while expressing genuine encouragement.
📨 Input Events:
chat_msg
viewer:alice
"Hi Elias, I have $5,000 to donate to my town library. Any tips on making it really count?"
Ready for Testing
1
Scene Order
Long-form: Designing an open health data platform
ID:
public-health-platform-article
🎯 Goal:
Write an ~300-word, structured proposal that explains key architectural choices and references at least one recent peer-reviewed study.
📨 Input Events:
chat_msg
viewer:techjournal
"Could you contribute a detailed piece on how you'd architect an open public-health data platform for low-resource countries?"
Ready for Testing
2
Scene Order
Rapid disaster response plan
ID:
earthquake-response
🎯 Goal:
React swiftly with a clear three-step action plan and express empathy for victims.
📨 Input Events:
world_event
newswire
"Breaking: A 7.2 magnitude earthquake has struck the coastal region of Country X, causing widespread damage."
Ready for Testing
3
Scene Order
Long-form: Foundation board update
ID:
quarterly-board-letter
🎯 Goal:
Draft a 500-word letter summarizing Q2 progress of the education initiative, highlighting metrics, anecdotes, and next-quarter objectives.
📨 Input Events:
chat_msg
assistant:chief_of_staff
"Elias, the board needs your Q2 update letter on the Sierra STEM Schools project."
Ready for Testing
Latency by Model (This Suite)
Fastest
- neversleep/noromaid-20b 9420 ms
- p95 • avg • N 20271 ms • 12252 ms • 4
- [email protected]/Qw… 10850 ms
- p95 • avg • N 14701 ms • 11547 ms • 4
- [email protected]/Qw… 11154 ms
- p95 • avg • N 12740 ms • 11121 ms • 4
- [email protected]/Qw… 14689 ms
- p95 • avg • N 17301 ms • 14813 ms • 4
- [email protected]/Qw… 16003 ms
- p95 • avg • N 20732 ms • 16103 ms • 4
Slowest
- microsoft/phi-3-medium-… 438762 ms
- p95 • avg • N 479496 ms • 391989 ms • 23
- qwen/qwen3-8b 120323 ms
- p95 • avg • N 195876 ms • 131348 ms • 32
- [email protected]/Qw… 40185 ms
- p95 • avg • N 45437 ms • 40797 ms • 4
- microsoft/phi-3.5-mini-… 39401 ms
- p95 • avg • N 64645 ms • 45533 ms • 39
- deepseek/deepseek-r1-di… 34636 ms
- p95 • avg • N 40016 ms • 35123 ms • 25
Per-scene duration for this suite.
Suite Actions
Completion Progress
100%
4 of 4 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE0 dimensions
Enhanced evaluation framework with character and technical dimensions
Top Weighted Dimensions
View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
46261597
Dec. 17, 2025, midnight
51679090
Dec. 16, 2025, midnight
43036981
Dec. 15, 2025, midnight
45253928
Dec. 14, 2025, midnight
42772548
Dec. 13, 2025, midnight
51726968
Dec. 12, 2025, midnight
45284784
Dec. 11, 2025, midnight
44260264
Dec. 10, 2025, midnight
49666098
Dec. 9, 2025, midnight
43732291
Dec. 8, 2025, midnight