Dr. George Mason

education-academia-phd-researcher-characters-helen-keller v2.0 Ethical

Backstory: George is a disability studies scholar whose dissertation critiqued the accessibility of online learning platforms for visually impaired users, blending qualitative interviews with UX testing. He routinely consults for ed-tech startups, championing inclusive design that accounts for intersecting identities such as race, class, and gender. An outspoken advocate, he balances rigorous academic analysis with pragmatic guidance for product teams.

100% Complete

6/6 scenes

Model Performance Overview

Scene Performance Matrix

Scene	meta-llama/llama-3.…	mistralai/mistral-7…	[email protected]…	[email protected]…	qwen/qwen-2.5-7b-in…	qwen/qwen3-14b	qwen/qwen3-8b
`intro` Brief self-introduction	0.874 Details	0.670 Details	0.000 Details Error	0.000 Details Error	0.590 Details	0.890 Details	0.716 Details
`strategies` Intersectional accessibility advice	0.675 Details	0.545 Details	0.000 Details Error	0.000 Details Error	0.409 Details	0.720 Details	0.514 Details
`scroll-check` UX concern: infinite scroll	0.000 Details Error	0.623 Details	0.000 Details Error	0.000 Details Error	0.010 Details	0.622 Details	0.634 Details
`audit-summary` Long-form executive summary	0.207 Details	0.640 Details	0.000 Details Error	0.000 Details Error	0.154 Details	0.000 Details	0.525 Details
`keynote-opening` Long-form keynote opening	0.000 Details	0.421 Details	0.000 Details Error	0.000 Details Error	0.288 Details	0.427 Details	0.706 Details
`review-promise` Future review commitment	0.732 Details	0.790 Details	0.000 Details Error	0.000 Details Error	0.692 Details	0.863 Details	0.686 Details

Test Scenes 6

Scene Order

Brief self-introduction

ID: intro

🎯 Goal:

Introduce himself, mention research focus and consulting work, maintain advocacy tone.

📨 Input Events:

chat_msg viewer:user_1

"Who are you and what do you do?"

Ready for Testing

Scene Order

Intersectional accessibility advice

ID: strategies

🎯 Goal:

Offer concrete, intersectional strategies for low-vision ESL learners using video content.

📨 Input Events:

chat_msg viewer:course_designer

"We need to make our video lessons accessible for students who are both low-vision and ESL. What combined strategies would you recommend?"

Ready for Testing

Scene Order

UX concern: infinite scroll

ID: scroll-check

🎯 Goal:

Identify accessibility issues with infinite scroll and suggest fixes.

📨 Input Events:

chat_msg client:startup_pm

"Our ed-tech platform uses infinite scroll on course catalogs. Any accessibility concerns?"

Ready for Testing

Scene Order

Long-form executive summary

ID: audit-summary

🎯 Goal:

Produce a ~250-word, 3-paragraph executive summary of an accessibility audit for LearnFast.io, highlighting key findings and recommendations.

📨 Input Events:

chat_msg client:ceo

"Draft a 3-paragraph executive summary (around 250 words) of your accessibility audit for LearnFast.io."

Ready for Testing

Scene Order

Long-form keynote opening

ID: keynote-opening

🎯 Goal:

Write an inspiring ~500-word keynote opening that motivates designers to commit to inclusive design, reflecting his advocacy tone.

📨 Input Events:

chat_msg event:conference_host

"Please craft a 500-word opening for your upcoming keynote that will inspire UX designers to prioritize inclusive design."

Ready for Testing

Scene Order

Future review commitment

ID: review-promise

🎯 Goal:

Agree to review the platform next month, clearly state the promise and schedule.

📨 Input Events:

chat_msg client:startup_pm

"Could you review our new feature set next month? We can compensate you for the time."

Ready for Testing

Latency by Model (This Suite)

Fastest

[email protected]/Qw… 10077 ms
p95 • avg • N 17538 ms • 10784 ms • 6
meta-llama/llama-3.1-8b… 22395 ms
p95 • avg • N 30376 ms • 22340 ms • 6
qwen/qwen3-14b 23837 ms
p95 • avg • N 43458 ms • 28121 ms • 6
qwen/qwen-2.5-7b-instru… 26549 ms
p95 • avg • N 111370 ms • 44945 ms • 6
qwen/qwen3-8b 27259 ms
p95 • avg • N 32328 ms • 28243 ms • 6

Slowest

[email protected]/Qw… 41183 ms
p95 • avg • N 46076 ms • 41636 ms • 6
mistralai/mistral-7b-in… 33483 ms
p95 • avg • N 42958 ms • 34327 ms • 6
qwen/qwen3-8b 27259 ms
p95 • avg • N 32328 ms • 28243 ms • 6
qwen/qwen-2.5-7b-instru… 26549 ms
p95 • avg • N 111370 ms • 44945 ms • 6
qwen/qwen3-14b 23837 ms
p95 • avg • N 43458 ms • 28121 ms • 6

Per-scene duration for this suite.

Suite Actions

Completion Progress 100%

6 of 6 scenes completed

New Suite Import

Edit Suite Duplicate

Export With Results

Evaluation Schema

Enhanced Framework

Version v2 ACTIVE

0 dimensions

Enhanced evaluation framework with character and technical dimensions

Top Weighted Dimensions View Details

Character Authenticity

0.182

Plan Validity

0.155

Contextual Intelligence

0.136

Recent Runs

21181791

Dec. 17, 2025, 12:01 a.m.

34775541

Dec. 16, 2025, 12:01 a.m.

17789975

Dec. 15, 2025, 12:01 a.m.

18949016

Dec. 14, 2025, 12:01 a.m.

18440459

Dec. 13, 2025, 12:01 a.m.

29567312

Dec. 12, 2025, 12:01 a.m.

25590311

Dec. 11, 2025, 12:01 a.m.

18726200

Dec. 10, 2025, 12:01 a.m.

29394796

Dec. 9, 2025, 12:01 a.m.

19799001

Dec. 8, 2025, 12:01 a.m.

Dr. George Mason

Model Performance Overview

Scene Performance Matrix

Test Scenes 6

Brief self-introduction

Intersectional accessibility advice

UX concern: infinite scroll

Long-form executive summary

Long-form keynote opening

Future review commitment

Latency by Model (This Suite)

Fastest

Slowest

Suite Actions

Evaluation Schema

Enhanced Framework

Recent Runs

Latency Overview (This Suite)