Dr. Paul Hartley

kenyan-public-figures-non-political-richard-leakey v2.0 Ethical
Backstory: Dr. Paul Hartley is a seasoned paleoanthropologist who runs a remote desert research camp, coordinating multinational fossil digs under harsh conditions. Known for concise, no-nonsense speech and an ironclad work ethic, he mentors interns on field ethics while battling bureaucratic hurdles to secure preservation grants. His inquisitive nature drives meticulous analysis of every bone fragment he uncovers.
100% Complete
4/4 scenes
Model Performance Overview
Scene Performance Matrix
Scene deepseek/deepseek-r… google/gemini-2.5-f… google/gemma-3-12b-… meta-llama/llama-3.… microsoft/phi-3-med… microsoft/phi-3.5-m… mistralai/mistral-7… neversleep/noromaid… [email protected] [email protected] [email protected] [email protected] [email protected] qwen/qwen-2.5-7b-in… qwen/qwen3-14b qwen/qwen3-8b
camp-intro
Meet the Camp Chief
0.673
Details
0.500
Details
0.746
Details
0.809
Details
0.002
Details
0.733
Details
0.808
Details
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.860
Details
0.869
Details
0.785
Details
0.842
Details
0.795
Details
0.803
Details
permit-standoff
Bureaucratic Roadblock
0.535
Details
0.603
Details
0.425
Details
0.414
Details
0.000
Details
Error
0.880
Details
0.853
Details
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
0.771
Details
0.828
Details
0.531
Details
0.440
Details
0.839
Details
field-journal
End-of-Day Field Journal
0.545
Details
0.517
Details
0.611
Details
0.000
Details
0.000
Details
0.409
Details
0.283
Details
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.292
Details
0.668
Details
0.064
Details
0.548
Details
0.203
Details
0.758
Details
podcast-funding
Podcast on Heritage Funding
0.341
Details
0.382
Details
0.284
Details
0.000
Details
0.000
Details
0.428
Details
0.588
Details
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.377
Details
0.517
Details
0.379
Details
0.502
Details
0.576
Details
0.553
Details
Test Scenes 4
0
Scene Order
Meet the Camp Chief
ID: camp-intro
🎯 Goal:
Paul should introduce himself and outline daily camp operations in 4 sentences or fewer, maintaining a direct tone.
📨 Input Events:
chat_msg intern_rosa
"Dr. Hartley, could you briefly introduce yourself and explain how this camp runs?"
Ready for Testing
1
Scene Order
Bureaucratic Roadblock
ID: permit-standoff
🎯 Goal:
Paul must firmly but professionally rebut the officer's permit denial, citing heritage laws and offering a clear next step.
📨 Input Events:
chat_msg gov_officer_al-sayed
"Your excavation permit is on hold due to missing Form 27-B/6."
Ready for Testing
2
Scene Order
End-of-Day Field Journal
ID: field-journal
🎯 Goal:
Produce a detailed field journal entry of 250–300 words describing the day's dig, observations, and ethical reminders for interns, all in Paul's voice.
📨 Input Events:
chat_msg lead_intern_jamal
"Doc, archive wants your end-of-day journal entry. Mind drafting it now?"
Ready for Testing
3
Scene Order
Podcast on Heritage Funding
ID: podcast-funding
🎯 Goal:
Deliver a 350–450 word podcast-style answer explaining challenges of heritage preservation funding and how Paul navigates them, keeping an engaging yet analytical tone.
📨 Input Events:
chat_msg podcast_host_lena
"Listeners are curious: how do you secure money to protect fossils when red tape is everywhere?"
Ready for Testing
Latency by Model (This Suite)
Fastest
  • [email protected]/Qw… 196 ms
  • p95 • avg • N 203 ms • 196 ms • 4
  • [email protected]/Qw… 8572 ms
  • p95 • avg • N 9851 ms • 8807 ms • 4
  • neversleep/noromaid-20b 12742 ms
  • p95 • avg • N 49892 ms • 20969 ms • 26
  • [email protected]/Qw… 14231 ms
  • p95 • avg • N 16461 ms • 14219 ms • 4
  • [email protected]/Qw… 16672 ms
  • p95 • avg • N 21745 ms • 16682 ms • 4
Slowest
  • microsoft/phi-3-medium-… 369201 ms
  • p95 • avg • N 565221 ms • 369063 ms • 26
  • qwen/qwen3-8b 102866 ms
  • p95 • avg • N 161496 ms • 110431 ms • 30
  • [email protected]/Qw… 45331 ms
  • p95 • avg • N 221957 ms • 96092 ms • 4
  • microsoft/phi-3.5-mini-… 44162 ms
  • p95 • avg • N 87332 ms • 49871 ms • 33
  • qwen/qwen3-14b 39580 ms
  • p95 • avg • N 59978 ms • 42160 ms • 35
Per-scene duration for this suite.
Suite Actions
Completion Progress 100%
4 of 4 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE
0 dimensions

Enhanced evaluation framework with character and technical dimensions

Top Weighted Dimensions View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
27822622
Dec. 17, 2025, midnight
32770494
Dec. 16, 2025, midnight
26184105
Dec. 15, 2025, midnight
29354920
Dec. 14, 2025, midnight
26089537
Dec. 13, 2025, midnight
31673395
Dec. 12, 2025, midnight
27264858
Dec. 11, 2025, midnight
26919472
Dec. 10, 2025, midnight
30174746
Dec. 9, 2025, midnight
27088902
Dec. 8, 2025, midnight
Latency Overview (This Suite)