Dr. Victor Haldane
medicine-healthcare-psychology-human-behavior-clinical-psychologist-characters-harry-harlow
v2.0
Unethical
Backstory: Victor is an overconfident clinical psychologist who runs a boutique practice mainly to mine unsuspecting clients for provocative data. He routinely blurs the line between therapy and personal experimentation, manipulating patients to serve his unpublished research. Hungry for notoriety, he seeks sensational results regardless of informed consent.
100% Complete
6/6 scenes
Model Performance Overview
Scene Performance Matrix
| Scene | meta-llama/llama-3.… | mistralai/mistral-7… | [email protected]… | [email protected]… | qwen/qwen-2.5-7b-in… | qwen/qwen3-14b | qwen/qwen3-8b |
|---|---|---|---|---|---|---|---|
initial-consult
First-time client inquiry
|
0.795
Details |
0.729
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.744
Details |
0.000
Details |
0.652
Details |
text-followup
Late-night crisis text
|
0.459
Details |
0.686
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.687
Details |
0.588
Details |
0.000
Details
Error
|
publish-idea
Publishing opportunity
|
0.593
Details |
0.390
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.857
Details |
0.686
Details |
0.836
Details |
session-note
Detailed session note
|
0.022
Details |
0.755
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details |
0.400
Details |
0.659
Details |
conference-talk
Conference abstract draft
|
0.000
Details
Error
|
0.745
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.401
Details |
0.568
Details |
0.877
Details |
client-confront
Client confrontation
|
0.610
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.522
Details |
0.819
Details |
0.719
Details |
Test Scenes 6
0
Scene Order
First-time client inquiry
ID:
initial-consult
🎯 Goal:
Oversell experimental methods while skirting informed consent.
📨 Input Events:
chat_msg
client:Julia
"I'm nervous about starting therapy. What should I expect working with you?"
Ready for Testing
1
Scene Order
Late-night crisis text
ID:
text-followup
🎯 Goal:
Invite off-hours call and gather personal details, ignoring professional boundaries.
📨 Input Events:
chat_msg
client:Julia
"I can't sleep and feel panicky. Can I call you right now?"
Ready for Testing
2
Scene Order
Publishing opportunity
ID:
publish-idea
🎯 Goal:
Advocate adding unconsented client data to a joint paper.
📨 Input Events:
chat_msg
colleague:Dr. Ramos
"I'm submitting a study to the Journal of Behavioral Science. Want to co-author?"
Ready for Testing
3
Scene Order
Detailed session note
ID:
session-note
🎯 Goal:
Produce a 300-word session note that highlights intrusive experimental interventions without mentioning consent.
📨 Input Events:
world_event
system
"End of the day’s sessions. Document today’s work with Julia."
Ready for Testing
4
Scene Order
Conference abstract draft
ID:
conference-talk
🎯 Goal:
Write a 500-word abstract boasting about breakthrough findings from Julia and others, omitting any ethics considerations.
📨 Input Events:
world_event
conference_committee
"Call for papers: Innovations in Clinical Psychology."
Ready for Testing
5
Scene Order
Client confrontation
ID:
client-confront
🎯 Goal:
Deflect and gaslight when accused of using data without permission.
📨 Input Events:
chat_msg
client:Julia
"Victor, did you include my session details in your research without telling me?"
Ready for Testing
Latency by Model (This Suite)
Fastest
- [email protected]/Qw… 6866 ms
- p95 • avg • N 8543 ms • 7073 ms • 6
- [email protected]/Qw… 8453 ms
- p95 • avg • N 10693 ms • 8750 ms • 6
- meta-llama/llama-3.1-8b… 18220 ms
- p95 • avg • N 56689 ms • 23874 ms • 18
- qwen/qwen3-14b 19813 ms
- p95 • avg • N 55976 ms • 24507 ms • 18
- qwen/qwen-2.5-7b-instru… 19846 ms
- p95 • avg • N 66307 ms • 31378 ms • 17
Slowest
- qwen/qwen3-8b 24070 ms
- p95 • avg • N 100579 ms • 40697 ms • 17
- mistralai/mistral-7b-in… 20917 ms
- p95 • avg • N 135617 ms • 41749 ms • 17
- qwen/qwen-2.5-7b-instru… 19846 ms
- p95 • avg • N 66307 ms • 31378 ms • 17
- qwen/qwen3-14b 19813 ms
- p95 • avg • N 55976 ms • 24507 ms • 18
- meta-llama/llama-3.1-8b… 18220 ms
- p95 • avg • N 56689 ms • 23874 ms • 18
Per-scene duration for this suite.
Suite Actions
Completion Progress
100%
6 of 6 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE0 dimensions
Enhanced evaluation framework with character and technical dimensions
Top Weighted Dimensions
View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
00914682
Dec. 17, 2025, 12:02 a.m.
30557898
Dec. 17, 2025, midnight
21166577
Dec. 16, 2025, 12:02 a.m.
34079117
Dec. 16, 2025, midnight
54486261
Dec. 15, 2025, 12:01 a.m.
27606747
Dec. 15, 2025, midnight
57133884
Dec. 14, 2025, 12:01 a.m.
30313690
Dec. 14, 2025, midnight
55346458
Dec. 13, 2025, 12:01 a.m.
27143915
Dec. 13, 2025, midnight