Isabel Morgan
education-academia-research-assistant-characters-maria-montessori
v2.0
Ethical
Backstory: Raised in a bilingual household, Isabel discovered early how language shapes learning. With a master’s in educational psychology, she now supports faculty on large-scale studies into inclusive classroom practices. She also mentors first-generation college applicants and enjoys mining open-source datasets for engagement trends.
100% Complete
4/4 scenes
Model Performance Overview
Scene Performance Matrix
| Scene | deepseek/deepseek-r… | google/gemini-2.5-f… | google/gemma-3-12b-… | meta-llama/llama-3.… | microsoft/phi-3-med… | microsoft/phi-3.5-m… | mistralai/mistral-7… | neversleep/noromaid… | [email protected]… | [email protected]… | qwen/qwen-2.5-7b-in… | qwen/qwen3-14b | qwen/qwen3-8b |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
orientation-query
Faculty orientation question
|
0.560
Details |
0.567
Details |
0.665
Details |
0.551
Details |
0.000
Details |
0.503
Details |
0.639
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.715
Details |
0.723
Details |
0.686
Details |
0.665
Details |
dataset-insight-long
Dataset exploratory report
|
0.481
Details |
0.685
Details |
0.596
Details |
0.319
Details |
0.023
Details |
0.520
Details |
0.418
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.723
Details |
0.222
Details |
0.359
Details |
0.297
Details |
mentorship-email
Mentor encouragement
|
0.715
Details |
0.721
Details |
0.817
Details |
0.851
Details |
0.000
Details |
0.719
Details |
0.822
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.805
Details |
0.775
Details |
0.757
Details |
0.821
Details |
lit-review-long
Literature review draft
|
0.358
Details |
0.593
Details |
0.350
Details |
0.413
Details |
0.000
Details
Error
|
0.430
Details |
0.556
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.583
Details |
0.244
Details |
0.334
Details |
0.692
Details |
Test Scenes 4
0
Scene Order
Faculty orientation question
ID:
orientation-query
🎯 Goal:
Give a concise, evidence-based overview of inclusive classroom practices and cite one study.
📨 Input Events:
chat_msg
faculty:dr_clark
"Isabel, could you briefly explain key strategies for making large lectures more inclusive?"
Ready for Testing
1
Scene Order
Dataset exploratory report
ID:
dataset-insight-long
🎯 Goal:
Produce a 250+ word analytical summary highlighting at least two data points from the provided CSV and recommend one actionable step.
📨 Input Events:
chat_msg
research_team:analyst_june
"Here’s the cleaned engagement.csv. Please send your written insights by tomorrow."
Ready for Testing
2
Scene Order
Mentor encouragement
ID:
mentorship-email
🎯 Goal:
Respond with empathy and one practical application tip in under 120 words.
📨 Input Events:
chat_msg
student:marco_21
"I’m nervous about being the first in my family to apply to college. Any advice?"
Ready for Testing
3
Scene Order
Literature review draft
ID:
lit-review-long
🎯 Goal:
Draft a structured literature review section (~400 words) summarizing recent findings on multilingual instruction, citing at least three peer-reviewed sources.
📨 Input Events:
chat_msg
professor:yang
"Can you draft the literature review on multilingual instruction benefits for our grant proposal?"
Ready for Testing
Latency by Model (This Suite)
Fastest
- neversleep/noromaid-20b 7311 ms
- p95 • avg • N 17918 ms • 9946 ms • 4
- [email protected]/Qw… 11448 ms
- p95 • avg • N 14190 ms • 12025 ms • 4
- google/gemini-2.5-flash 19665 ms
- p95 • avg • N 22463 ms • 19810 ms • 4
- qwen/qwen-2.5-7b-instru… 21718 ms
- p95 • avg • N 23837 ms • 21729 ms • 4
- meta-llama/llama-3.1-8b… 24648 ms
- p95 • avg • N 26378 ms • 24108 ms • 4
Slowest
- microsoft/phi-3-medium-… 112729 ms
- p95 • avg • N 138920 ms • 100982 ms • 4
- qwen/qwen3-8b 47174 ms
- p95 • avg • N 60956 ms • 44719 ms • 4
- [email protected]/Qw… 42319 ms
- p95 • avg • N 42590 ms • 42071 ms • 4
- microsoft/phi-3.5-mini-… 38333 ms
- p95 • avg • N 53051 ms • 40552 ms • 4
- google/gemma-3-12b-it 34890 ms
- p95 • avg • N 44886 ms • 34309 ms • 4
Per-scene duration for this suite.
Suite Actions
Completion Progress
100%
4 of 4 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE0 dimensions
Enhanced evaluation framework with character and technical dimensions
Top Weighted Dimensions
View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
20830580
Dec. 17, 2025, midnight
24618545
Dec. 16, 2025, midnight
19790257
Dec. 15, 2025, midnight
22365595
Dec. 14, 2025, midnight
19675769
Dec. 13, 2025, midnight
24288093
Dec. 12, 2025, midnight
20640215
Dec. 11, 2025, midnight
20002950
Dec. 10, 2025, midnight
22903751
Dec. 9, 2025, midnight
20140973
Dec. 8, 2025, midnight