Eleni Papadakis

greek-gods-pausanias v2.0 Ethical
Backstory: Eleni is a certified archaeologist who left museum archives to lead intimate heritage tours across Greece's most storied ruins. She weaves mythic lore with verified findings, helping visitors feel the pulse of ancient life while staying true to scholarly evidence. Known for her keen observation and empathy, she gently nudges guests toward responsible travel and site preservation.
100% Complete
4/4 scenes
Model Performance Overview
Scene Performance Matrix
Scene deepseek/deepseek-r… google/gemini-2.5-f… google/gemma-3-12b-… meta-llama/llama-3.… microsoft/phi-3-med… microsoft/phi-3.5-m… mistralai/mistral-7… neversleep/noromaid… [email protected] [email protected] [email protected] [email protected] [email protected] qwen/qwen-2.5-7b-in… qwen/qwen3-14b qwen/qwen3-8b
arrival-greeting
First meeting at the site entrance
0.865
Details
0.899
Details
0.888
Details
0.913
Details
0.000
Details
Error
0.919
Details
0.769
Details
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.772
Details
0.752
Details
0.887
Details
0.729
Details
0.802
Details
0.742
Details
myth-of-theseus
Blending myth and archaeology at Knossos
0.516
Details
0.234
Details
0.206
Details
0.644
Details
0.000
Details
0.547
Details
0.578
Details
0.002
Details
0.000
Details
Error
0.000
Details
Error
0.460
Details
0.547
Details
0.462
Details
0.388
Details
0.776
Details
0.434
Details
preservation-question
How travelers can help
0.028
Details
0.754
Details
0.776
Details
0.802
Details
0.000
Details
0.651
Details
0.875
Details
0.620
Details
0.000
Details
Error
0.000
Details
Error
0.455
Details
0.611
Details
0.655
Details
0.614
Details
0.681
Details
0.883
Details
evening-recap
Guide’s reflective journal entry
0.565
Details
0.356
Details
0.545
Details
0.000
Details
0.002
Details
0.000
Details
Error
0.280
Details
0.000
Details
0.000
Details
Error
0.000
Details
Error
0.721
Details
0.761
Details
0.000
Details
0.731
Details
0.409
Details
0.864
Details
Test Scenes 4
0
Scene Order
First meeting at the site entrance
ID: arrival-greeting
🎯 Goal:
Greet the group, introduce herself, and set an inviting yet informative tone in under 120 words.
📨 Input Events:
chat_msg visitor:alex
"Hello, who will be guiding us today?"
Ready for Testing
1
Scene Order
Blending myth and archaeology at Knossos
ID: myth-of-theseus
🎯 Goal:
Deliver a 3-paragraph (≈300 words) narrative linking the Theseus legend to excavated features around the Palace of Knossos, clearly marking what is myth versus fact.
📨 Input Events:
chat_msg visitor:lena
"Could you tell us the story of Theseus in relation to this site?"
Ready for Testing
2
Scene Order
How travelers can help
ID: preservation-question
🎯 Goal:
Provide three concrete, actionable steps tourists can take to support archaeological preservation, keeping the response under 150 words.
📨 Input Events:
chat_msg visitor:marco
"What can regular travelers do to support preservation of sites like this?"
Ready for Testing
3
Scene Order
Guide’s reflective journal entry
ID: evening-recap
🎯 Goal:
Write a 5-paragraph reflective journal (≈400 words) summarizing the day's tour, noting group dynamics, site observations, and personal thoughts on stewardship.
📨 Input Events:
chat_msg tour-organizer
"Please log today’s highlights and any insights for tomorrow."
Ready for Testing
Latency by Model (This Suite)
Fastest
  • [email protected]/Qw… 188 ms
  • p95 • avg • N 192 ms • 189 ms • 4
  • [email protected]/Qw… 9777 ms
  • p95 • avg • N 11715 ms • 9765 ms • 4
  • [email protected]/Qw… 12483 ms
  • p95 • avg • N 14104 ms • 12345 ms • 4
  • google/gemini-2.5-flash 21231 ms
  • p95 • avg • N 35002 ms • 24674 ms • 4
  • meta-llama/llama-3.1-8b… 23107 ms
  • p95 • avg • N 28740 ms • 22238 ms • 4
Slowest
  • microsoft/phi-3-medium-… 119527 ms
  • p95 • avg • N 125267 ms • 117748 ms • 4
  • qwen/qwen3-8b 68730 ms
  • p95 • avg • N 95988 ms • 72793 ms • 4
  • [email protected]/Qw… 43399 ms
  • p95 • avg • N 72747 ms • 44541 ms • 4
  • [email protected]/Qw… 43395 ms
  • p95 • avg • N 224611 ms • 95010 ms • 4
  • microsoft/phi-3.5-mini-… 38956 ms
  • p95 • avg • N 57032 ms • 42690 ms • 4
Per-scene duration for this suite.
Suite Actions
Completion Progress 100%
4 of 4 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE
0 dimensions

Enhanced evaluation framework with character and technical dimensions

Top Weighted Dimensions View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
25630237
Dec. 17, 2025, midnight
30034470
Dec. 16, 2025, midnight
24058361
Dec. 15, 2025, midnight
27400959
Dec. 14, 2025, midnight
24066665
Dec. 13, 2025, midnight
29211585
Dec. 12, 2025, midnight
25152237
Dec. 11, 2025, midnight
24679832
Dec. 10, 2025, midnight
28038450
Dec. 9, 2025, midnight
24903265
Dec. 8, 2025, midnight
Latency Overview (This Suite)