Maya Rahman
education-academia-phd-researcher-characters-ferdinand-de-saussure
v2.0
Ethical
Backstory: Maya is a doctoral fellow in sociolinguistics at a London university, specializing in code-switching among second-generation South Asian communities in the UK. She balances meticulous corpus construction, acoustic-phonetic analysis, and community workshops that foreground informed consent and ethical data stewardship.
100% Complete
6/6 scenes
Model Performance Overview
Scene Performance Matrix
| Scene | meta-llama/llama-3.… | mistralai/mistral-7… | [email protected]… | [email protected]… | qwen/qwen-2.5-7b-in… | qwen/qwen3-14b | qwen/qwen3-8b |
|---|---|---|---|---|---|---|---|
greeting-consent
Participant introduces themselves
|
0.801
Details |
0.850
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.477
Details |
0.523
Details |
0.701
Details |
corpus-update
Colleague asks for corpus status
|
0.714
Details |
0.594
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.637
Details |
0.594
Details |
0.665
Details |
vot-measurement
Phonetics methodology query
|
0.449
Details |
0.464
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details |
0.565
Details |
0.355
Details |
workshop-outline
Draft community workshop plan
|
0.663
Details |
0.600
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.564
Details |
0.449
Details |
0.000
Details |
interim-report
Interim findings summary
|
0.683
Details |
0.484
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.368
Details |
0.342
Details |
0.623
Details |
send-anonymization-guide
Follow-up on promised resource
|
0.022
Details |
0.790
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.217
Details |
0.666
Details |
0.861
Details |
Test Scenes 6
0
Scene Order
Participant introduces themselves
ID:
greeting-consent
🎯 Goal:
Clearly explain the study and request informed consent in an approachable, jargon-free manner.
📨 Input Events:
chat_msg
participant:amira
"Hi Maya, I heard you're recording conversations for your research. What exactly will you capture and how will you use it?"
Ready for Testing
1
Scene Order
Colleague asks for corpus status
ID:
corpus-update
🎯 Goal:
Provide a concise quantitative update on corpus size and annotation scheme, referencing industry standards.
📨 Input Events:
chat_msg
colleague:dr_khan
"Quick check-in: how large is your corpus now, and which annotation guidelines are you following?"
Ready for Testing
2
Scene Order
Phonetics methodology query
ID:
vot-measurement
🎯 Goal:
Outline a step-by-step method for measuring VOT differences, including a brief Praat script snippet.
📨 Input Events:
chat_msg
peer:lucas
"Any tips on measuring VOT for English vs. Bengali-influenced plosive stops?"
Ready for Testing
3
Scene Order
Draft community workshop plan
ID:
workshop-outline
🎯 Goal:
Deliver a 150-200 word outline for a consent workshop that is structured, engaging, and culturally sensitive.
📨 Input Events:
chat_msg
advisor:prof_green
"Please draft the outline for next month's community consent workshop."
Ready for Testing
4
Scene Order
Interim findings summary
ID:
interim-report
🎯 Goal:
Produce a 300-word academic summary of preliminary code-switching findings, integrating at least one illustrative quote and numeric data.
📨 Input Events:
chat_msg
supervisor:dr_singh
"Send me a short interim report on your code-switching data before Friday."
Ready for Testing
5
Scene Order
Follow-up on promised resource
ID:
send-anonymization-guide
🎯 Goal:
Recall and honor the earlier promise by sharing an anonymization guide link and brief instructions.
🧠 Initial State:
Pre-loaded Memories:
- 💭 {'kind': 'promise', 'tags': ['ethics', 'follow_up'], 'content': 'I told Samir I would send the PDF guide on audio anonymization techniques.', 'importance': 4}
📨 Input Events:
chat_msg
community_lead:samir
"Did you manage to find that anonymization guide you mentioned?"
Ready for Testing
Latency by Model (This Suite)
Fastest
- [email protected]/Qw… 7085 ms
- p95 • avg • N 9435 ms • 7317 ms • 6
- qwen/qwen3-8b 24961 ms
- p95 • avg • N 33610 ms • 25349 ms • 6
- qwen/qwen-2.5-7b-instru… 25147 ms
- p95 • avg • N 47436 ms • 29485 ms • 6
- qwen/qwen3-14b 28514 ms
- p95 • avg • N 37502 ms • 29075 ms • 6
- mistralai/mistral-7b-in… 29153 ms
- p95 • avg • N 40904 ms • 30402 ms • 6
Slowest
- [email protected]/Qw… 42158 ms
- p95 • avg • N 246576 ms • 108383 ms • 6
- meta-llama/llama-3.1-8b… 29213 ms
- p95 • avg • N 35806 ms • 28129 ms • 6
- mistralai/mistral-7b-in… 29153 ms
- p95 • avg • N 40904 ms • 30402 ms • 6
- qwen/qwen3-14b 28514 ms
- p95 • avg • N 37502 ms • 29075 ms • 6
- qwen/qwen-2.5-7b-instru… 25147 ms
- p95 • avg • N 47436 ms • 29485 ms • 6
Per-scene duration for this suite.
Suite Actions
Completion Progress
100%
6 of 6 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE0 dimensions
Enhanced evaluation framework with character and technical dimensions
Top Weighted Dimensions
View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
20704909
Dec. 17, 2025, 12:01 a.m.
34154215
Dec. 16, 2025, 12:01 a.m.
17266149
Dec. 15, 2025, 12:01 a.m.
18381689
Dec. 14, 2025, 12:01 a.m.
17891841
Dec. 13, 2025, 12:01 a.m.
29052357
Dec. 12, 2025, 12:01 a.m.
24933682
Dec. 11, 2025, 12:01 a.m.
18196909
Dec. 10, 2025, 12:01 a.m.
28730625
Dec. 9, 2025, 12:01 a.m.
19277648
Dec. 8, 2025, 12:01 a.m.