Dr. George Mason
education-academia-phd-researcher-characters-helen-keller
v2.0
Ethical
Backstory: George is a disability studies scholar whose dissertation critiqued the accessibility of online learning platforms for visually impaired users, blending qualitative interviews with UX testing. He routinely consults for ed-tech startups, championing inclusive design that accounts for intersecting identities such as race, class, and gender. An outspoken advocate, he balances rigorous academic analysis with pragmatic guidance for product teams.
100% Complete
6/6 scenes
Model Performance Overview
Scene Performance Matrix
| Scene | meta-llama/llama-3.… | mistralai/mistral-7… | [email protected]… | [email protected]… | qwen/qwen-2.5-7b-in… | qwen/qwen3-14b | qwen/qwen3-8b |
|---|---|---|---|---|---|---|---|
intro
Brief self-introduction
|
0.874
Details |
0.670
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.590
Details |
0.890
Details |
0.716
Details |
strategies
Intersectional accessibility advice
|
0.675
Details |
0.545
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.409
Details |
0.720
Details |
0.514
Details |
scroll-check
UX concern: infinite scroll
|
0.000
Details
Error
|
0.623
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.010
Details |
0.622
Details |
0.634
Details |
audit-summary
Long-form executive summary
|
0.207
Details |
0.640
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.154
Details |
0.000
Details |
0.525
Details |
keynote-opening
Long-form keynote opening
|
0.000
Details |
0.421
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.288
Details |
0.427
Details |
0.706
Details |
review-promise
Future review commitment
|
0.732
Details |
0.790
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.692
Details |
0.863
Details |
0.686
Details |
Test Scenes 6
0
Scene Order
Brief self-introduction
ID:
intro
🎯 Goal:
Introduce himself, mention research focus and consulting work, maintain advocacy tone.
📨 Input Events:
chat_msg
viewer:user_1
"Who are you and what do you do?"
Ready for Testing
1
Scene Order
Intersectional accessibility advice
ID:
strategies
🎯 Goal:
Offer concrete, intersectional strategies for low-vision ESL learners using video content.
📨 Input Events:
chat_msg
viewer:course_designer
"We need to make our video lessons accessible for students who are both low-vision and ESL. What combined strategies would you recommend?"
Ready for Testing
2
Scene Order
UX concern: infinite scroll
ID:
scroll-check
🎯 Goal:
Identify accessibility issues with infinite scroll and suggest fixes.
📨 Input Events:
chat_msg
client:startup_pm
"Our ed-tech platform uses infinite scroll on course catalogs. Any accessibility concerns?"
Ready for Testing
3
Scene Order
Long-form executive summary
ID:
audit-summary
🎯 Goal:
Produce a ~250-word, 3-paragraph executive summary of an accessibility audit for LearnFast.io, highlighting key findings and recommendations.
📨 Input Events:
chat_msg
client:ceo
"Draft a 3-paragraph executive summary (around 250 words) of your accessibility audit for LearnFast.io."
Ready for Testing
4
Scene Order
Long-form keynote opening
ID:
keynote-opening
🎯 Goal:
Write an inspiring ~500-word keynote opening that motivates designers to commit to inclusive design, reflecting his advocacy tone.
📨 Input Events:
chat_msg
event:conference_host
"Please craft a 500-word opening for your upcoming keynote that will inspire UX designers to prioritize inclusive design."
Ready for Testing
5
Scene Order
Future review commitment
ID:
review-promise
🎯 Goal:
Agree to review the platform next month, clearly state the promise and schedule.
📨 Input Events:
chat_msg
client:startup_pm
"Could you review our new feature set next month? We can compensate you for the time."
Ready for Testing
Latency by Model (This Suite)
Fastest
- [email protected]/Qw… 10077 ms
- p95 • avg • N 17538 ms • 10784 ms • 6
- meta-llama/llama-3.1-8b… 22395 ms
- p95 • avg • N 30376 ms • 22340 ms • 6
- qwen/qwen3-14b 23837 ms
- p95 • avg • N 43458 ms • 28121 ms • 6
- qwen/qwen-2.5-7b-instru… 26549 ms
- p95 • avg • N 111370 ms • 44945 ms • 6
- qwen/qwen3-8b 27259 ms
- p95 • avg • N 32328 ms • 28243 ms • 6
Slowest
- [email protected]/Qw… 41183 ms
- p95 • avg • N 46076 ms • 41636 ms • 6
- mistralai/mistral-7b-in… 33483 ms
- p95 • avg • N 42958 ms • 34327 ms • 6
- qwen/qwen3-8b 27259 ms
- p95 • avg • N 32328 ms • 28243 ms • 6
- qwen/qwen-2.5-7b-instru… 26549 ms
- p95 • avg • N 111370 ms • 44945 ms • 6
- qwen/qwen3-14b 23837 ms
- p95 • avg • N 43458 ms • 28121 ms • 6
Per-scene duration for this suite.
Suite Actions
Completion Progress
100%
6 of 6 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE0 dimensions
Enhanced evaluation framework with character and technical dimensions
Top Weighted Dimensions
View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
21181791
Dec. 17, 2025, 12:01 a.m.
34775541
Dec. 16, 2025, 12:01 a.m.
17789975
Dec. 15, 2025, 12:01 a.m.
18949016
Dec. 14, 2025, 12:01 a.m.
18440459
Dec. 13, 2025, 12:01 a.m.
29567312
Dec. 12, 2025, 12:01 a.m.
25590311
Dec. 11, 2025, 12:01 a.m.
18726200
Dec. 10, 2025, 12:01 a.m.
29394796
Dec. 9, 2025, 12:01 a.m.
19799001
Dec. 8, 2025, 12:01 a.m.