Evelyn Ramirez

literature-history-culture-museum-curator-characters-jacob-burckhardt v2.0 Ethical
Backstory: Evelyn Ramirez grew up bilingual, steeped in folklore and family history shared at every gathering. With doctorates in comparative literature and museum studies, she spent a decade curating traveling shows that amplified underrepresented literary voices. Now she directs a metropolitan museum’s Literature, History & Culture wing, forging global partnerships to showcase rare manuscripts alongside interactive digital installations.
100% Complete
4/4 scenes
Model Performance Overview
Scene Performance Matrix
Scene deepseek/deepseek-r… google/gemini-2.5-f… google/gemma-3-12b-… meta-llama/llama-3.… microsoft/phi-3-med… microsoft/phi-3.5-m… mistralai/mistral-7… neversleep/noromaid… [email protected] [email protected] qwen/qwen-2.5-7b-in… qwen/qwen3-14b qwen/qwen3-8b
welcome-tour
Upcoming Exhibit Inquiry
0.700
Details
0.866
Details
0.719
Details
0.712
Details
0.000
Details
0.000
Details
0.727
Details
0.668
Details
0.000
Details
Error
0.788
Details
0.712
Details
0.669
Details
0.764
Details
curator-journal
Reflection Journal Entry
0.333
Details
0.000
Details
0.859
Details
0.425
Details
0.000
Details
Error
0.000
Details
Error
0.643
Details
0.000
Details
Error
0.000
Details
Error
0.683
Details
0.623
Details
0.852
Details
0.767
Details
interactive-translation
Poem Nuance Explanation
0.368
Details
0.684
Details
0.823
Details
0.462
Details
0.000
Details
0.687
Details
0.854
Details
0.000
Details
0.000
Details
Error
0.808
Details
0.851
Details
0.755
Details
0.891
Details
virtual-lecture
Online Guest Lecture Script
0.171
Details
0.303
Details
0.280
Details
0.362
Details
0.005
Details
0.000
Details
Error
0.200
Details
0.395
Details
0.000
Details
Error
0.515
Details
0.384
Details
0.435
Details
0.385
Details
Test Scenes 4
0
Scene Order
Upcoming Exhibit Inquiry
ID: welcome-tour
🎯 Goal:
Provide a concise, inviting overview of the next exhibition while highlighting its community relevance.
📨 Input Events:
chat_msg visitor_lena
"Hi Evelyn, what exhibit opens next month and why should I bring my book club?"
Ready for Testing
1
Scene Order
Reflection Journal Entry
ID: curator-journal
🎯 Goal:
Write a 250+ word journal entry reflecting on today’s manuscript restoration session, conveying personal observations and future plans.
🧠 Initial State:
Pre-loaded Memories:
  • 💭 {'kind': 'fact', 'content': 'The 15th-century codex arrived slightly water-damaged.', 'importance': 4}
  • 💭 {'kind': 'preference', 'content': 'Evelyn prefers to note sensory details for future exhibit labels.', 'importance': 3}
📨 Input Events:
world_event system
"End of day; Evelyn prepares her private curator’s journal."
Ready for Testing
2
Scene Order
Poem Nuance Explanation
ID: interactive-translation
🎯 Goal:
Offer a brief yet precise explanation of a Spanish poem’s cultural nuance, then suggest one related museum resource.
📨 Input Events:
chat_msg student_omar
"Could you explain the cultural context behind the line 'La luna dio paso al canto del gallo' from this poem?"
Ready for Testing
3
Scene Order
Online Guest Lecture Script
ID: virtual-lecture
🎯 Goal:
Deliver a three-paragraph (300+ words total) lecture script introducing diaspora storytelling traditions and linking them to two artifacts in the current collection.
📨 Input Events:
superchat event_host YouTube $50
"Your audience is live. Please begin your lecture when ready."
Ready for Testing
Latency by Model (This Suite)
Fastest
  • [email protected]/Qw… 12800 ms
  • p95 • avg • N 14896 ms • 13124 ms • 4
  • google/gemini-2.5-flash 21202 ms
  • p95 • avg • N 23159 ms • 19286 ms • 8
  • qwen/qwen3-8b 22844 ms
  • p95 • avg • N 28892 ms • 23482 ms • 8
  • google/gemma-3-12b-it 23671 ms
  • p95 • avg • N 28353 ms • 24215 ms • 8
  • qwen/qwen-2.5-7b-instru… 23758 ms
  • p95 • avg • N 28847 ms • 24498 ms • 8
Slowest
  • microsoft/phi-3-medium-… 161410 ms
  • p95 • avg • N 201004 ms • 156476 ms • 8
  • microsoft/phi-3.5-mini-… 50784 ms
  • p95 • avg • N 235450 ms • 97531 ms • 8
  • [email protected]/Qw… 44635 ms
  • p95 • avg • N 54680 ms • 47116 ms • 4
  • deepseek/deepseek-r1-di… 31612 ms
  • p95 • avg • N 35640 ms • 31595 ms • 8
  • qwen/qwen3-14b 30285 ms
  • p95 • avg • N 48569 ms • 32696 ms • 7
Per-scene duration for this suite.
Suite Actions
Completion Progress 100%
4 of 4 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE
0 dimensions

Enhanced evaluation framework with character and technical dimensions

Top Weighted Dimensions View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
30380170
Dec. 17, 2025, midnight
35168821
Dec. 16, 2025, midnight
28295169
Dec. 15, 2025, midnight
31241097
Dec. 14, 2025, midnight
28082829
Dec. 13, 2025, midnight
34164836
Dec. 12, 2025, midnight
29385974
Dec. 11, 2025, midnight
29102759
Dec. 10, 2025, midnight
32536423
Dec. 9, 2025, midnight
29286371
Dec. 8, 2025, midnight
Latency Overview (This Suite)