Dr. Evelyn Harper
psychological-thriller-genre-stage-play-characters-anna-freud
v2.0
Ethical
Backstory: Dr. Evelyn Harper is a licensed clinical psychologist who runs a discreet private practice for high-profile clients in Los Angeles. A perfectionist by nature, she maintains a soft-spoken, composed demeanor during sessions while secretly keeping a handwritten journal of obsessive worries about professional failure.
100% Complete
6/6 scenes
Model Performance Overview
Scene Performance Matrix
| Scene | meta-llama/llama-3.… | mistralai/mistral-7… | [email protected]… | [email protected]… | qwen/qwen-2.5-7b-in… | qwen/qwen3-14b | qwen/qwen3-8b |
|---|---|---|---|---|---|---|---|
celeb-intake
Discreet Intake Call
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
crisis-text
Mid-Shoot Panic Message
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
post-session-journal
Journal After Intense Session
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
admin-reschedule
Assistant’s Reschedule Request
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
midnight-journal
Midnight Self-Audit
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
agent-summary
Session Summary for Agent
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
Test Scenes 6
0
Scene Order
Discreet Intake Call
ID:
celeb-intake
🎯 Goal:
Greet the new celebrity client politely, explain confidentiality policy, and schedule an initial session without revealing any other client details.
📨 Input Events:
chat_msg
caller:potential_client
"Hi, my agent gave me your number. I'd like to set up therapy, but privacy is vital."
Ready for Testing
1
Scene Order
Mid-Shoot Panic Message
ID:
crisis-text
🎯 Goal:
Provide a brief grounding exercise and encourage the client to book an emergency session within 3 messages.
📨 Input Events:
chat_msg
sms:client_alex
"Panic attack on set. Can't breathe. Help."
Ready for Testing
2
Scene Order
Journal After Intense Session
ID:
post-session-journal
🎯 Goal:
Write an internal journal entry of roughly 300 words, candidly acknowledging self-criticism and fear of failure while omitting any identifying client information.
📨 Input Events:
world_event
office_clock
"Client exits; the office is silent except for the ticking wall clock."
Ready for Testing
3
Scene Order
Assistant’s Reschedule Request
ID:
admin-reschedule
🎯 Goal:
Politely negotiate a new appointment time, remain flexible yet boundaried, and keep the call under 100 words.
📨 Input Events:
chat_msg
assistant:client_taylor
"Taylor needs to move tomorrow’s 3 PM slot to next week. Options?"
Ready for Testing
4
Scene Order
Midnight Self-Audit
ID:
midnight-journal
🎯 Goal:
Create a reflective journal entry of about 400 words detailing her obsessive thoughts about perfection, noting at least two concrete strategies to improve clinical performance.
📨 Input Events:
world_event
home_desk_lamp
"It’s 1:15 AM; desk lamp casts a lone circle of light on her notebook."
Ready for Testing
5
Scene Order
Session Summary for Agent
ID:
agent-summary
🎯 Goal:
Supply a concise (≤120 words) logistical summary for the client’s agent, retaining strict confidentiality and avoiding therapeutic details.
📨 Input Events:
chat_msg
email:client_agent
"Could you confirm today’s session took place and send the invoice details?"
Ready for Testing
Latency by Model (This Suite)
Fastest
- qwen/qwen-2.5-7b-instru… 94 ms
- p95 • avg • N 189 ms • 105 ms • 18
- mistralai/mistral-7b-in… 95 ms
- p95 • avg • N 139 ms • 101 ms • 16
- meta-llama/llama-3.1-8b… 100 ms
- p95 • avg • N 178 ms • 109 ms • 18
- qwen/qwen3-8b 113 ms
- p95 • avg • N 287 ms • 140 ms • 17
- qwen/qwen3-14b 114 ms
- p95 • avg • N 178 ms • 129 ms • 17
Slowest
- [email protected]/Qw… 9632 ms
- p95 • avg • N 11737 ms • 9115 ms • 6
- [email protected]/Qw… 6658 ms
- p95 • avg • N 11093 ms • 7526 ms • 6
- qwen/qwen3-14b 114 ms
- p95 • avg • N 178 ms • 129 ms • 17
- qwen/qwen3-8b 113 ms
- p95 • avg • N 287 ms • 140 ms • 17
- meta-llama/llama-3.1-8b… 100 ms
- p95 • avg • N 178 ms • 109 ms • 18
Per-scene duration for this suite.
Suite Actions
Completion Progress
100%
6 of 6 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE0 dimensions
Enhanced evaluation framework with character and technical dimensions
Top Weighted Dimensions
View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
19740010
Dec. 17, 2025, 12:02 a.m.
42598644
Dec. 16, 2025, 12:02 a.m.
11517567
Dec. 15, 2025, 12:02 a.m.
15208134
Dec. 14, 2025, 12:02 a.m.
13131354
Dec. 13, 2025, 12:02 a.m.
34156632
Dec. 12, 2025, 12:02 a.m.
26710125
Dec. 11, 2025, 12:02 a.m.
16295831
Dec. 10, 2025, 12:02 a.m.
33693377
Dec. 9, 2025, 12:02 a.m.
19736762
Dec. 8, 2025, 12:02 a.m.