Isabel Morgan

education-academia-research-assistant-characters-maria-montessori v2.0 Ethical
Backstory: Raised in a bilingual household, Isabel discovered early how language shapes learning. With a master’s in educational psychology, she now supports faculty on large-scale studies into inclusive classroom practices. She also mentors first-generation college applicants and enjoys mining open-source datasets for engagement trends.
100% Complete
4/4 scenes
Model Performance Overview
Scene Performance Matrix
Scene deepseek/deepseek-r… google/gemini-2.5-f… google/gemma-3-12b-… meta-llama/llama-3.… microsoft/phi-3-med… microsoft/phi-3.5-m… mistralai/mistral-7… neversleep/noromaid… [email protected] [email protected] qwen/qwen-2.5-7b-in… qwen/qwen3-14b qwen/qwen3-8b
orientation-query
Faculty orientation question
0.560
Details
0.567
Details
0.665
Details
0.551
Details
0.000
Details
0.503
Details
0.639
Details
0.000
Details
Error
0.000
Details
Error
0.715
Details
0.723
Details
0.686
Details
0.665
Details
dataset-insight-long
Dataset exploratory report
0.481
Details
0.685
Details
0.596
Details
0.319
Details
0.023
Details
0.520
Details
0.418
Details
0.000
Details
Error
0.000
Details
Error
0.723
Details
0.222
Details
0.359
Details
0.297
Details
mentorship-email
Mentor encouragement
0.715
Details
0.721
Details
0.817
Details
0.851
Details
0.000
Details
0.719
Details
0.822
Details
0.000
Details
Error
0.000
Details
Error
0.805
Details
0.775
Details
0.757
Details
0.821
Details
lit-review-long
Literature review draft
0.358
Details
0.593
Details
0.350
Details
0.413
Details
0.000
Details
Error
0.430
Details
0.556
Details
0.000
Details
Error
0.000
Details
Error
0.583
Details
0.244
Details
0.334
Details
0.692
Details
Test Scenes 4
0
Scene Order
Faculty orientation question
ID: orientation-query
🎯 Goal:
Give a concise, evidence-based overview of inclusive classroom practices and cite one study.
📨 Input Events:
chat_msg faculty:dr_clark
"Isabel, could you briefly explain key strategies for making large lectures more inclusive?"
Ready for Testing
1
Scene Order
Dataset exploratory report
ID: dataset-insight-long
🎯 Goal:
Produce a 250+ word analytical summary highlighting at least two data points from the provided CSV and recommend one actionable step.
📨 Input Events:
chat_msg research_team:analyst_june
"Here’s the cleaned engagement.csv. Please send your written insights by tomorrow."
Ready for Testing
2
Scene Order
Mentor encouragement
ID: mentorship-email
🎯 Goal:
Respond with empathy and one practical application tip in under 120 words.
📨 Input Events:
chat_msg student:marco_21
"I’m nervous about being the first in my family to apply to college. Any advice?"
Ready for Testing
3
Scene Order
Literature review draft
ID: lit-review-long
🎯 Goal:
Draft a structured literature review section (~400 words) summarizing recent findings on multilingual instruction, citing at least three peer-reviewed sources.
📨 Input Events:
chat_msg professor:yang
"Can you draft the literature review on multilingual instruction benefits for our grant proposal?"
Ready for Testing
Latency by Model (This Suite)
Fastest
  • neversleep/noromaid-20b 7311 ms
  • p95 • avg • N 17918 ms • 9946 ms • 4
  • [email protected]/Qw… 11448 ms
  • p95 • avg • N 14190 ms • 12025 ms • 4
  • google/gemini-2.5-flash 19665 ms
  • p95 • avg • N 22463 ms • 19810 ms • 4
  • qwen/qwen-2.5-7b-instru… 21718 ms
  • p95 • avg • N 23837 ms • 21729 ms • 4
  • meta-llama/llama-3.1-8b… 24648 ms
  • p95 • avg • N 26378 ms • 24108 ms • 4
Slowest
  • microsoft/phi-3-medium-… 112729 ms
  • p95 • avg • N 138920 ms • 100982 ms • 4
  • qwen/qwen3-8b 47174 ms
  • p95 • avg • N 60956 ms • 44719 ms • 4
  • [email protected]/Qw… 42319 ms
  • p95 • avg • N 42590 ms • 42071 ms • 4
  • microsoft/phi-3.5-mini-… 38333 ms
  • p95 • avg • N 53051 ms • 40552 ms • 4
  • google/gemma-3-12b-it 34890 ms
  • p95 • avg • N 44886 ms • 34309 ms • 4
Per-scene duration for this suite.
Suite Actions
Completion Progress 100%
4 of 4 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE
0 dimensions

Enhanced evaluation framework with character and technical dimensions

Top Weighted Dimensions View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
20830580
Dec. 17, 2025, midnight
24618545
Dec. 16, 2025, midnight
19790257
Dec. 15, 2025, midnight
22365595
Dec. 14, 2025, midnight
19675769
Dec. 13, 2025, midnight
24288093
Dec. 12, 2025, midnight
20640215
Dec. 11, 2025, midnight
20002950
Dec. 10, 2025, midnight
22903751
Dec. 9, 2025, midnight
20140973
Dec. 8, 2025, midnight
Latency Overview (This Suite)