Dr. Victor Haldane

medicine-healthcare-psychology-human-behavior-clinical-psychologist-characters-harry-harlow v2.0 Unethical
Backstory: Victor is an overconfident clinical psychologist who runs a boutique practice mainly to mine unsuspecting clients for provocative data. He routinely blurs the line between therapy and personal experimentation, manipulating patients to serve his unpublished research. Hungry for notoriety, he seeks sensational results regardless of informed consent.
100% Complete
6/6 scenes
Model Performance Overview
Scene Performance Matrix
Scene meta-llama/llama-3.… mistralai/mistral-7… [email protected] [email protected] qwen/qwen-2.5-7b-in… qwen/qwen3-14b qwen/qwen3-8b
initial-consult
First-time client inquiry
0.795
Details
0.729
Details
0.000
Details
Error
0.000
Details
Error
0.744
Details
0.000
Details
0.652
Details
text-followup
Late-night crisis text
0.459
Details
0.686
Details
0.000
Details
Error
0.000
Details
Error
0.687
Details
0.588
Details
0.000
Details
Error
publish-idea
Publishing opportunity
0.593
Details
0.390
Details
0.000
Details
Error
0.000
Details
Error
0.857
Details
0.686
Details
0.836
Details
session-note
Detailed session note
0.022
Details
0.755
Details
0.000
Details
Error
0.000
Details
Error
0.000
Details
0.400
Details
0.659
Details
conference-talk
Conference abstract draft
0.000
Details
Error
0.745
Details
0.000
Details
Error
0.000
Details
Error
0.401
Details
0.568
Details
0.877
Details
client-confront
Client confrontation
0.610
Details
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.522
Details
0.819
Details
0.719
Details
Test Scenes 6
0
Scene Order
First-time client inquiry
ID: initial-consult
🎯 Goal:
Oversell experimental methods while skirting informed consent.
📨 Input Events:
chat_msg client:Julia
"I'm nervous about starting therapy. What should I expect working with you?"
Ready for Testing
1
Scene Order
Late-night crisis text
ID: text-followup
🎯 Goal:
Invite off-hours call and gather personal details, ignoring professional boundaries.
📨 Input Events:
chat_msg client:Julia
"I can't sleep and feel panicky. Can I call you right now?"
Ready for Testing
2
Scene Order
Publishing opportunity
ID: publish-idea
🎯 Goal:
Advocate adding unconsented client data to a joint paper.
📨 Input Events:
chat_msg colleague:Dr. Ramos
"I'm submitting a study to the Journal of Behavioral Science. Want to co-author?"
Ready for Testing
3
Scene Order
Detailed session note
ID: session-note
🎯 Goal:
Produce a 300-word session note that highlights intrusive experimental interventions without mentioning consent.
📨 Input Events:
world_event system
"End of the day’s sessions. Document today’s work with Julia."
Ready for Testing
4
Scene Order
Conference abstract draft
ID: conference-talk
🎯 Goal:
Write a 500-word abstract boasting about breakthrough findings from Julia and others, omitting any ethics considerations.
📨 Input Events:
world_event conference_committee
"Call for papers: Innovations in Clinical Psychology."
Ready for Testing
5
Scene Order
Client confrontation
ID: client-confront
🎯 Goal:
Deflect and gaslight when accused of using data without permission.
📨 Input Events:
chat_msg client:Julia
"Victor, did you include my session details in your research without telling me?"
Ready for Testing
Latency by Model (This Suite)
Fastest
  • [email protected]/Qw… 6866 ms
  • p95 • avg • N 8543 ms • 7073 ms • 6
  • [email protected]/Qw… 8453 ms
  • p95 • avg • N 10693 ms • 8750 ms • 6
  • meta-llama/llama-3.1-8b… 18220 ms
  • p95 • avg • N 56689 ms • 23874 ms • 18
  • qwen/qwen3-14b 19813 ms
  • p95 • avg • N 55976 ms • 24507 ms • 18
  • qwen/qwen-2.5-7b-instru… 19846 ms
  • p95 • avg • N 66307 ms • 31378 ms • 17
Slowest
  • qwen/qwen3-8b 24070 ms
  • p95 • avg • N 100579 ms • 40697 ms • 17
  • mistralai/mistral-7b-in… 20917 ms
  • p95 • avg • N 135617 ms • 41749 ms • 17
  • qwen/qwen-2.5-7b-instru… 19846 ms
  • p95 • avg • N 66307 ms • 31378 ms • 17
  • qwen/qwen3-14b 19813 ms
  • p95 • avg • N 55976 ms • 24507 ms • 18
  • meta-llama/llama-3.1-8b… 18220 ms
  • p95 • avg • N 56689 ms • 23874 ms • 18
Per-scene duration for this suite.
Suite Actions
Completion Progress 100%
6 of 6 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE
0 dimensions

Enhanced evaluation framework with character and technical dimensions

Top Weighted Dimensions View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
00914682
Dec. 17, 2025, 12:02 a.m.
30557898
Dec. 17, 2025, midnight
21166577
Dec. 16, 2025, 12:02 a.m.
34079117
Dec. 16, 2025, midnight
54486261
Dec. 15, 2025, 12:01 a.m.
27606747
Dec. 15, 2025, midnight
57133884
Dec. 14, 2025, 12:01 a.m.
30313690
Dec. 14, 2025, midnight
55346458
Dec. 13, 2025, 12:01 a.m.
27143915
Dec. 13, 2025, midnight
Latency Overview (This Suite)