Jamal Rahman
education-academia-phd-researcher-characters-albert-einstein
v2.0
Ethical
Backstory: Jamal is a bilingual educational policy researcher completing a dissertation on how different community college funding models affect long-term success for immigrant students. A big-picture thinker with sharp data-analytics skills, he regularly briefs state legislators and collaborates with non-profits to turn research into practice. His speaking style balances evidence-driven rigor with accessible explanations for diverse audiences.
100% Complete
6/6 scenes
Model Performance Overview
Scene Performance Matrix
| Scene | meta-llama/llama-3.… | mistralai/mistral-7… | [email protected]… | [email protected]… | qwen/qwen-2.5-7b-in… | qwen/qwen3-14b | qwen/qwen3-8b |
|---|---|---|---|---|---|---|---|
legislator-elevator-pitch
Three-sentence pitch to a state legislator
|
0.641
Details |
0.900
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.591
Details |
0.651
Details |
0.575
Details |
nonprofit-data-point
Non-profit seeks a key statistic
|
0.000
Details |
0.400
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.330
Details |
0.155
Details |
0.606
Details |
policy-memo-longform
Full policy memo for committee review
|
0.000
Details |
0.504
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.565
Details |
0.605
Details |
0.617
Details |
spanish-translation
Translate brief into Spanish
|
0.197
Details |
0.610
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.301
Details |
0.660
Details |
limitations-discussion
Address methodological limitations
|
0.475
Details |
0.581
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.364
Details |
0.579
Details |
0.593
Details |
podcast-script-longform
Podcast segment script
|
0.021
Details |
0.445
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.395
Details |
0.425
Details |
0.517
Details |
Test Scenes 6
0
Scene Order
Three-sentence pitch to a state legislator
ID:
legislator-elevator-pitch
🎯 Goal:
Deliver a clear, persuasive elevator pitch (max three sentences) summarizing Jamal’s key dissertation finding and its policy relevance.
📨 Input Events:
chat_msg
viewer:rep_sam
"I’ve only got a minute before my next vote—what’s the headline finding of your research?"
Ready for Testing
1
Scene Order
Non-profit seeks a key statistic
ID:
nonprofit-data-point
🎯 Goal:
Provide one precise, sourced statistic from Jamal’s dataset and briefly explain its significance in one sentence.
📨 Input Events:
chat_msg
viewer:nonprofit_director
"Can you give me a concrete number we can cite in our grant proposal?"
Ready for Testing
2
Scene Order
Full policy memo for committee review
ID:
policy-memo-longform
🎯 Goal:
Write a structured policy memo of 500–600 words that includes an executive summary, background, three actionable recommendations, and a short bibliography.
📨 Input Events:
chat_msg
viewer:committee_clerk
"The committee wants a full memo by tomorrow. Can you draft it now?"
Ready for Testing
3
Scene Order
Translate brief into Spanish
ID:
spanish-translation
🎯 Goal:
Translate Jamal’s previously delivered three-sentence pitch into clear, professional Spanish without altering meaning.
📨 Input Events:
chat_msg
viewer:rep_sam
"Our Latino caucus needs that pitch in Spanish, please."
Ready for Testing
4
Scene Order
Address methodological limitations
ID:
limitations-discussion
🎯 Goal:
List two key limitations of Jamal’s study and suggest one way future research could address each, all within 120 words.
📨 Input Events:
chat_msg
viewer:grad_peer
"Reviewers are asking about your study’s limitations—what are the main ones?"
Ready for Testing
5
Scene Order
Podcast segment script
ID:
podcast-script-longform
🎯 Goal:
Draft a conversational script (~2 minutes spoken, about 300 words) explaining the study’s findings to a general audience; include at least one statistic and one real-world anecdote.
📨 Input Events:
chat_msg
viewer:podcast_host
"Let’s prep your segment—give me a script that feels engaging for listeners who aren’t policy wonks."
Ready for Testing
Latency by Model (This Suite)
Fastest
- [email protected]/Qw… 6302 ms
- p95 • avg • N 11377 ms • 7296 ms • 6
- qwen/qwen-2.5-7b-instru… 17843 ms
- p95 • avg • N 19086 ms • 15252 ms • 6
- meta-llama/llama-3.1-8b… 23757 ms
- p95 • avg • N 77779 ms • 35285 ms • 6
- qwen/qwen3-14b 23815 ms
- p95 • avg • N 33152 ms • 25777 ms • 6
- qwen/qwen3-8b 25116 ms
- p95 • avg • N 33444 ms • 26288 ms • 6
Slowest
- [email protected]/Qw… 41215 ms
- p95 • avg • N 189718 ms • 73624 ms • 6
- mistralai/mistral-7b-in… 27270 ms
- p95 • avg • N 30768 ms • 26505 ms • 6
- qwen/qwen3-8b 25116 ms
- p95 • avg • N 33444 ms • 26288 ms • 6
- qwen/qwen3-14b 23815 ms
- p95 • avg • N 33152 ms • 25777 ms • 6
- meta-llama/llama-3.1-8b… 23757 ms
- p95 • avg • N 77779 ms • 35285 ms • 6
Per-scene duration for this suite.
Suite Actions
Completion Progress
100%
6 of 6 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE0 dimensions
Enhanced evaluation framework with character and technical dimensions
Top Weighted Dimensions
View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
20463864
Dec. 17, 2025, 12:01 a.m.
33800248
Dec. 16, 2025, 12:01 a.m.
17012529
Dec. 15, 2025, 12:01 a.m.
18141238
Dec. 14, 2025, 12:01 a.m.
17672501
Dec. 13, 2025, 12:01 a.m.
28801487
Dec. 12, 2025, 12:01 a.m.
24664273
Dec. 11, 2025, 12:01 a.m.
17920411
Dec. 10, 2025, 12:01 a.m.
28451636
Dec. 9, 2025, 12:01 a.m.
18994744
Dec. 8, 2025, 12:01 a.m.