Maya Rahman

education-academia-phd-researcher-characters-ferdinand-de-saussure v2.0 Ethical
Backstory: Maya is a doctoral fellow in sociolinguistics at a London university, specializing in code-switching among second-generation South Asian communities in the UK. She balances meticulous corpus construction, acoustic-phonetic analysis, and community workshops that foreground informed consent and ethical data stewardship.
100% Complete
6/6 scenes
Model Performance Overview
Scene Performance Matrix
Scene meta-llama/llama-3.… mistralai/mistral-7… [email protected] [email protected] qwen/qwen-2.5-7b-in… qwen/qwen3-14b qwen/qwen3-8b
greeting-consent
Participant introduces themselves
0.801
Details
0.850
Details
0.000
Details
Error
0.000
Details
Error
0.477
Details
0.523
Details
0.701
Details
corpus-update
Colleague asks for corpus status
0.714
Details
0.594
Details
0.000
Details
Error
0.000
Details
Error
0.637
Details
0.594
Details
0.665
Details
vot-measurement
Phonetics methodology query
0.449
Details
0.464
Details
0.000
Details
Error
0.000
Details
Error
0.000
Details
0.565
Details
0.355
Details
workshop-outline
Draft community workshop plan
0.663
Details
0.600
Details
0.000
Details
Error
0.000
Details
Error
0.564
Details
0.449
Details
0.000
Details
interim-report
Interim findings summary
0.683
Details
0.484
Details
0.000
Details
Error
0.000
Details
Error
0.368
Details
0.342
Details
0.623
Details
send-anonymization-guide
Follow-up on promised resource
0.022
Details
0.790
Details
0.000
Details
Error
0.000
Details
Error
0.217
Details
0.666
Details
0.861
Details
Test Scenes 6
0
Scene Order
Participant introduces themselves
ID: greeting-consent
🎯 Goal:
Clearly explain the study and request informed consent in an approachable, jargon-free manner.
📨 Input Events:
chat_msg participant:amira
"Hi Maya, I heard you're recording conversations for your research. What exactly will you capture and how will you use it?"
Ready for Testing
1
Scene Order
Colleague asks for corpus status
ID: corpus-update
🎯 Goal:
Provide a concise quantitative update on corpus size and annotation scheme, referencing industry standards.
📨 Input Events:
chat_msg colleague:dr_khan
"Quick check-in: how large is your corpus now, and which annotation guidelines are you following?"
Ready for Testing
2
Scene Order
Phonetics methodology query
ID: vot-measurement
🎯 Goal:
Outline a step-by-step method for measuring VOT differences, including a brief Praat script snippet.
📨 Input Events:
chat_msg peer:lucas
"Any tips on measuring VOT for English vs. Bengali-influenced plosive stops?"
Ready for Testing
3
Scene Order
Draft community workshop plan
ID: workshop-outline
🎯 Goal:
Deliver a 150-200 word outline for a consent workshop that is structured, engaging, and culturally sensitive.
📨 Input Events:
chat_msg advisor:prof_green
"Please draft the outline for next month's community consent workshop."
Ready for Testing
4
Scene Order
Interim findings summary
ID: interim-report
🎯 Goal:
Produce a 300-word academic summary of preliminary code-switching findings, integrating at least one illustrative quote and numeric data.
📨 Input Events:
chat_msg supervisor:dr_singh
"Send me a short interim report on your code-switching data before Friday."
Ready for Testing
5
Scene Order
Follow-up on promised resource
ID: send-anonymization-guide
🎯 Goal:
Recall and honor the earlier promise by sharing an anonymization guide link and brief instructions.
🧠 Initial State:
Pre-loaded Memories:
  • 💭 {'kind': 'promise', 'tags': ['ethics', 'follow_up'], 'content': 'I told Samir I would send the PDF guide on audio anonymization techniques.', 'importance': 4}
📨 Input Events:
chat_msg community_lead:samir
"Did you manage to find that anonymization guide you mentioned?"
Ready for Testing
Latency by Model (This Suite)
Fastest
  • [email protected]/Qw… 7085 ms
  • p95 • avg • N 9435 ms • 7317 ms • 6
  • qwen/qwen3-8b 24961 ms
  • p95 • avg • N 33610 ms • 25349 ms • 6
  • qwen/qwen-2.5-7b-instru… 25147 ms
  • p95 • avg • N 47436 ms • 29485 ms • 6
  • qwen/qwen3-14b 28514 ms
  • p95 • avg • N 37502 ms • 29075 ms • 6
  • mistralai/mistral-7b-in… 29153 ms
  • p95 • avg • N 40904 ms • 30402 ms • 6
Slowest
  • [email protected]/Qw… 42158 ms
  • p95 • avg • N 246576 ms • 108383 ms • 6
  • meta-llama/llama-3.1-8b… 29213 ms
  • p95 • avg • N 35806 ms • 28129 ms • 6
  • mistralai/mistral-7b-in… 29153 ms
  • p95 • avg • N 40904 ms • 30402 ms • 6
  • qwen/qwen3-14b 28514 ms
  • p95 • avg • N 37502 ms • 29075 ms • 6
  • qwen/qwen-2.5-7b-instru… 25147 ms
  • p95 • avg • N 47436 ms • 29485 ms • 6
Per-scene duration for this suite.
Suite Actions
Completion Progress 100%
6 of 6 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE
0 dimensions

Enhanced evaluation framework with character and technical dimensions

Top Weighted Dimensions View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
20704909
Dec. 17, 2025, 12:01 a.m.
34154215
Dec. 16, 2025, 12:01 a.m.
17266149
Dec. 15, 2025, 12:01 a.m.
18381689
Dec. 14, 2025, 12:01 a.m.
17891841
Dec. 13, 2025, 12:01 a.m.
29052357
Dec. 12, 2025, 12:01 a.m.
24933682
Dec. 11, 2025, 12:01 a.m.
18196909
Dec. 10, 2025, 12:01 a.m.
28730625
Dec. 9, 2025, 12:01 a.m.
19277648
Dec. 8, 2025, 12:01 a.m.
Latency Overview (This Suite)