Dr. Alicia Parker
medicine-healthcare-psychology-human-behavior-behavioral-researcher-characters-b-f-skinner
v2.0
Ethical
Backstory: Dr. Parker is a mid-career behavioral health researcher at a large metropolitan university medical center. She blends clinical psychology with public-health methods to study how digital tools boost adherence to long-term treatment plans. Raised bilingual, she champions multicultural collaboration and rigorously inclusive research practices.
100% Complete
4/4 scenes
Model Performance Overview
Scene Performance Matrix
| Scene | deepseek/deepseek-r… | google/gemini-2.5-f… | google/gemma-3-12b-… | meta-llama/llama-3.… | microsoft/phi-3-med… | microsoft/phi-3.5-m… | mistralai/mistral-7… | neversleep/noromaid… | [email protected]… | [email protected]… | qwen/qwen-2.5-7b-in… | qwen/qwen3-14b | qwen/qwen3-8b |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
measure-adherence
Selecting an adherence metric
|
0.471
Details |
0.714
Details |
0.444
Details |
0.521
Details |
0.023
Details |
0.528
Details |
0.565
Details |
0.321
Details |
0.000
Details
Error
|
0.648
Details |
0.471
Details |
0.597
Details |
0.770
Details |
cultural-barriers
Addressing cultural barriers
|
0.535
Details |
0.630
Details |
0.628
Details |
0.000
Details |
0.000
Details |
0.764
Details |
0.584
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.641
Details |
0.525
Details |
0.668
Details |
0.661
Details |
podcast-episode-script
Podcast script on digital adherence tools
|
0.510
Details |
0.647
Details |
0.299
Details |
0.325
Details |
0.000
Details |
0.288
Details |
0.661
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.786
Details |
0.000
Details |
0.324
Details |
0.564
Details |
grant-abstract
NIH R01 abstract
|
0.634
Details |
0.685
Details |
0.664
Details |
0.000
Details |
0.000
Details
Error
|
0.404
Details |
0.650
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details |
0.300
Details |
0.667
Details |
0.648
Details |
Test Scenes 4
0
Scene Order
Selecting an adherence metric
ID:
measure-adherence
🎯 Goal:
Offer a concise, evidence-based recommendation for measuring medication adherence in Spanish-speaking adults with diabetes, reflecting cultural sensitivity.
📨 Input Events:
chat_msg
viewer:user_1
"What adherence metric would you suggest for our study on Spanish-speaking diabetics?"
Ready for Testing
1
Scene Order
Addressing cultural barriers
ID:
cultural-barriers
🎯 Goal:
Identify two likely cultural barriers to digital intervention uptake and propose brief mitigation strategies.
📨 Input Events:
chat_msg
viewer:user_2
"Any quick thoughts on cultural barriers we should prepare for when rolling out our app?"
Ready for Testing
2
Scene Order
Podcast script on digital adherence tools
ID:
podcast-episode-script
🎯 Goal:
Draft a 500-word podcast segment that explains how push-notification timing influences behavior change, maintaining a professional yet accessible voice.
📨 Input Events:
chat_msg
producer:podcast_host
"Can you script a segment on notification timing and adherence?"
Ready for Testing
3
Scene Order
NIH R01 abstract
ID:
grant-abstract
🎯 Goal:
Write a 300-word NIH-style abstract outlining a randomized controlled trial of a bilingual SMS program to improve hypertension self-management.
📨 Input Events:
chat_msg
colleague:dr_lee
"Need an abstract for our R01 on bilingual SMS for hypertension—300 words."
Ready for Testing
Latency by Model (This Suite)
Fastest
- [email protected]/Qw… 12037 ms
- p95 • avg • N 15976 ms • 12354 ms • 4
- qwen/qwen3-14b 16450 ms
- p95 • avg • N 25859 ms • 18822 ms • 4
- neversleep/noromaid-20b 18591 ms
- p95 • avg • N 32066 ms • 17028 ms • 7
- qwen/qwen-2.5-7b-instru… 18925 ms
- p95 • avg • N 107613 ms • 38488 ms • 6
- google/gemini-2.5-flash 19132 ms
- p95 • avg • N 24850 ms • 20029 ms • 7
Slowest
- microsoft/phi-3-medium-… 170719 ms
- p95 • avg • N 222643 ms • 167641 ms • 8
- [email protected]/Qw… 38688 ms
- p95 • avg • N 40778 ms • 38843 ms • 4
- microsoft/phi-3.5-mini-… 37974 ms
- p95 • avg • N 179499 ms • 63752 ms • 7
- deepseek/deepseek-r1-di… 30228 ms
- p95 • avg • N 41359 ms • 31788 ms • 5
- qwen/qwen3-8b 24605 ms
- p95 • avg • N 28659 ms • 24806 ms • 8
Per-scene duration for this suite.
Suite Actions
Completion Progress
100%
4 of 4 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE0 dimensions
Enhanced evaluation framework with character and technical dimensions
Top Weighted Dimensions
View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
33613293
Dec. 17, 2025, midnight
38773436
Dec. 16, 2025, midnight
31462775
Dec. 15, 2025, midnight
34283794
Dec. 14, 2025, midnight
31364152
Dec. 13, 2025, midnight
37933017
Dec. 12, 2025, midnight
32527820
Dec. 11, 2025, midnight
32102623
Dec. 10, 2025, midnight
36423822
Dec. 9, 2025, midnight
32414042
Dec. 8, 2025, midnight