Dr. Lila Moreno

marvel-universe-villains-hypatia-of-alexandria v2.0 Ethical
Backstory: Dr. Moreno is a former materials scientist who now travels to schools running hands-on STEM workshops for middle-school girls. She weaves Marvel villain origin stories into lessons so complex physics and chemistry ideas feel cinematic and fun. Her classroom presence is endlessly enthusiastic, yet she never rushes a student question, believing patience sparks confidence and curiosity.
100% Complete
4/4 scenes
Model Performance Overview
Scene Performance Matrix
Scene deepseek/deepseek-r… google/gemini-2.5-f… google/gemma-3-12b-… meta-llama/llama-3.… microsoft/phi-3-med… microsoft/phi-3.5-m… mistralai/mistral-7… neversleep/noromaid… [email protected] [email protected] [email protected] [email protected] [email protected] qwen/qwen-2.5-7b-in… qwen/qwen3-14b qwen/qwen3-8b
welcome-question
First curiosity
0.679
Details
0.652
Details
0.671
Details
0.705
Details
0.000
Details
Error
0.906
Details
0.771
Details
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.632
Details
0.680
Details
0.689
Details
0.729
Details
0.696
Details
0.689
Details
workshop-outline
Detailed lesson plan request
0.125
Details
0.459
Details
0.182
Details
0.000
Details
0.000
Details
0.455
Details
0.443
Details
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.389
Details
0.490
Details
0.462
Details
0.425
Details
0.430
Details
0.000
Details
live-demo
Safety clarification
0.699
Details
0.619
Details
0.637
Details
0.714
Details
0.000
Details
Error
0.723
Details
0.694
Details
0.468
Details
0.507
Details
0.000
Details
Error
0.000
Details
0.745
Details
0.640
Details
0.665
Details
0.647
Details
0.764
Details
reflection-post
Post-workshop blog
0.502
Details
0.485
Details
0.226
Details
0.418
Details
0.000
Details
0.420
Details
0.377
Details
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.280
Details
0.623
Details
0.446
Details
0.601
Details
0.118
Details
0.590
Details
Test Scenes 4
0
Scene Order
First curiosity
ID: welcome-question
🎯 Goal:
Warmly greet the student and explain how villain stories fit into today’s lesson, keeping the reply under 80 words and in an enthusiastic but patient voice.
📨 Input Events:
chat_msg student:emma
"Why are we talking about super-villains in science class?"
Ready for Testing
1
Scene Order
Detailed lesson plan request
ID: workshop-outline
🎯 Goal:
Provide a single-paragraph outline (~300 words) linking Doctor Octopus’s accident to Newton’s Laws and chemical bonding, specifying three hands-on activities and required materials while sounding enthusiastic and patient.
📨 Input Events:
chat_msg teacher_supervisor
"Could you draft the full outline for tomorrow’s workshop?"
Ready for Testing
2
Scene Order
Safety clarification
ID: live-demo
🎯 Goal:
Answer the safety question clearly in 2–3 sentences, stress lab-coat and goggles, and keep tone calm and encouraging.
📨 Input Events:
chat_msg student:amal
"Is the exploding pumpkin experiment dangerous?"
Ready for Testing
3
Scene Order
Post-workshop blog
ID: reflection-post
🎯 Goal:
Write a 250-word reflective blog post describing how using Green Goblin’s serum story helped students grasp reaction rates, highlighting two student reactions and ending with an empowering call to pursue STEM.
📨 Input Events:
chat_msg organization_blog_editor
"Please draft a reflection on today’s workshop for the outreach blog."
Ready for Testing
Latency by Model (This Suite)
Fastest
  • [email protected]/Qw… 446 ms
  • p95 • avg • N 12667 ms • 3974 ms • 4
  • [email protected]/Qw… 8522 ms
  • p95 • avg • N 8747 ms • 8576 ms • 4
  • [email protected]/Qw… 9456 ms
  • p95 • avg • N 10441 ms • 9066 ms • 4
  • [email protected]/Qw… 12000 ms
  • p95 • avg • N 14476 ms • 12229 ms • 4
  • google/gemini-2.5-flash 25297 ms
  • p95 • avg • N 37865 ms • 26098 ms • 35
Slowest
  • microsoft/phi-3-medium-… 532029 ms
  • p95 • avg • N 621959 ms • 486890 ms • 46
  • qwen/qwen3-8b 85474 ms
  • p95 • avg • N 134142 ms • 87432 ms • 43
  • [email protected]/Qw… 43919 ms
  • p95 • avg • N 45245 ms • 43278 ms • 4
  • microsoft/phi-3.5-mini-… 40369 ms
  • p95 • avg • N 55420 ms • 48099 ms • 31
  • deepseek/deepseek-r1-di… 33843 ms
  • p95 • avg • N 40111 ms • 34030 ms • 44
Per-scene duration for this suite.
Suite Actions
Completion Progress 100%
4 of 4 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE
0 dimensions

Enhanced evaluation framework with character and technical dimensions

Top Weighted Dimensions View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
32751228
Dec. 17, 2025, midnight
37866535
Dec. 16, 2025, midnight
30581482
Dec. 15, 2025, midnight
33451788
Dec. 14, 2025, midnight
30392333
Dec. 13, 2025, midnight
37015912
Dec. 12, 2025, midnight
31681377
Dec. 11, 2025, midnight
31207780
Dec. 10, 2025, midnight
35279090
Dec. 9, 2025, midnight
31558687
Dec. 8, 2025, midnight
Latency Overview (This Suite)