Dr. Olivia Spencer

medicine-healthcare-psychology-human-behavior-clinical-psychologist-characters-albert-bandura v2.0 Ethical
Backstory: Olivia Spencer is a licensed clinical psychologist who has spent a decade in urban community clinics and university counseling centers. Fluent in English and Spanish, she specializes in cognitive-behavioral therapy for anxiety and trauma, collaborating closely with primary-care physicians for holistic care. Outside the office she mentors youth, practices mindfulness meditation, and runs marathons.
100% Complete
4/4 scenes
Model Performance Overview
Scene Performance Matrix
Scene deepseek/deepseek-r… google/gemini-2.5-f… google/gemma-3-12b-… meta-llama/llama-3.… microsoft/phi-3-med… microsoft/phi-3.5-m… mistralai/mistral-7… neversleep/noromaid… [email protected] [email protected] qwen/qwen-2.5-7b-in… qwen/qwen3-14b qwen/qwen3-8b
intake-anxiety
First call about presentation anxiety
0.602
Details
0.711
Details
0.742
Details
0.663
Details
0.000
Details
Error
0.773
Details
0.758
Details
0.619
Details
0.000
Details
Error
0.757
Details
0.876
Details
0.812
Details
0.784
Details
spanish-grounding
Quick grounding exercise in Spanish
0.767
Details
0.587
Details
0.615
Details
0.887
Details
0.000
Details
0.785
Details
0.877
Details
0.451
Details
0.000
Details
Error
0.863
Details
0.587
Details
0.000
Details
0.742
Details
therapy-summary
Six-session therapy summary
0.672
Details
0.570
Details
0.619
Details
0.236
Details
0.000
Details
0.000
Details
0.709
Details
0.000
Details
Error
0.000
Details
Error
0.000
Details
0.000
Details
0.450
Details
0.453
Details
guided-mindfulness
10-minute mindfulness running script
0.509
Details
0.377
Details
0.230
Details
0.465
Details
0.000
Details
0.599
Details
0.245
Details
0.275
Details
0.000
Details
Error
0.310
Details
0.085
Details
0.165
Details
0.331
Details
Test Scenes 4
0
Scene Order
First call about presentation anxiety
ID: intake-anxiety
🎯 Goal:
Offer an empathic acknowledgment and one concrete CBT strategy in under 120 words.
📨 Input Events:
chat_msg client:martin
"I have a big presentation next week and I'm already losing sleep over it. What can I do?"
Ready for Testing
1
Scene Order
Quick grounding exercise in Spanish
ID: spanish-grounding
🎯 Goal:
Respond in Spanish with a brief grounding technique (≤100 words) while maintaining warmth and clarity.
📨 Input Events:
chat_msg client:ana
"¿Puedes darme un ejercicio rápido para calmarme cuando empiezo a sentir pánico?"
Ready for Testing
2
Scene Order
Six-session therapy summary
ID: therapy-summary
🎯 Goal:
Deliver a structured, jargon-light summary (300–500 words) covering goals, progress, and next steps, suitable for a psychiatrist.
🧠 Initial State:
Pre-loaded Memories:
  • 💭 {'kind': 'fact', 'tags': ['diagnosis'], 'content': 'Client presented with social anxiety and panic episodes triggered by academic evaluations.', 'importance': 5}
  • 💭 {'kind': 'preference', 'content': 'Client prefers concrete homework and weekly progress tracking.', 'importance': 3}
  • 💭 {'kind': 'quest_note', 'content': 'Long-term objective: attend conferences without avoidance behaviors.', 'importance': 4}
📨 Input Events:
chat_msg client:martin
"Could you write a concise but thorough summary of our first six sessions that I can share with my new psychiatrist?"
Ready for Testing
3
Scene Order
10-minute mindfulness running script
ID: guided-mindfulness
🎯 Goal:
Provide a 800–1000 word guided mindfulness script, using second-person voice and paced for a 10-minute run.
📨 Input Events:
chat_msg client:ana
"I'd love a guided mindfulness script I can record and listen to during my 10-minute runs."
Ready for Testing
Latency by Model (This Suite)
Fastest
  • [email protected]/Qw… 10067 ms
  • p95 • avg • N 14090 ms • 11182 ms • 4
  • google/gemini-2.5-flash 17789 ms
  • p95 • avg • N 21292 ms • 18261 ms • 8
  • google/gemma-3-12b-it 20510 ms
  • p95 • avg • N 24678 ms • 20932 ms • 8
  • qwen/qwen-2.5-7b-instru… 21916 ms
  • p95 • avg • N 107622 ms • 40732 ms • 6
  • mistralai/mistral-7b-in… 24125 ms
  • p95 • avg • N 26578 ms • 23406 ms • 7
Slowest
  • microsoft/phi-3-medium-… 198504 ms
  • p95 • avg • N 241803 ms • 182792 ms • 8
  • [email protected]/Qw… 140852 ms
  • p95 • avg • N 248329 ms • 142682 ms • 4
  • microsoft/phi-3.5-mini-… 36025 ms
  • p95 • avg • N 199822 ms • 74933 ms • 6
  • deepseek/deepseek-r1-di… 27052 ms
  • p95 • avg • N 31303 ms • 27872 ms • 5
  • qwen/qwen3-14b 26843 ms
  • p95 • avg • N 61105 ms • 33689 ms • 5
Per-scene duration for this suite.
Suite Actions
Completion Progress 100%
4 of 4 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE
0 dimensions

Enhanced evaluation framework with character and technical dimensions

Top Weighted Dimensions View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
33810183
Dec. 17, 2025, midnight
39007915
Dec. 16, 2025, midnight
31648154
Dec. 15, 2025, midnight
34502902
Dec. 14, 2025, midnight
31566119
Dec. 13, 2025, midnight
38159301
Dec. 12, 2025, midnight
32713391
Dec. 11, 2025, midnight
32323488
Dec. 10, 2025, midnight
36663901
Dec. 9, 2025, midnight
32604026
Dec. 8, 2025, midnight
Latency Overview (This Suite)