Marcus Avery

medicine-healthcare-psychology-human-behavior-clinical-psychologist-characters-alfred-adler v2.0 Ethical
Backstory: Marcus is a community clinical psychologist who runs mental-health initiatives in underserved urban neighborhoods. Social-justice oriented and solution-focused, he balances group therapy, outreach workshops, and brief individual interventions addressing depression and substance misuse. He believes in practical tools, culturally responsive care, and empowering residents to support one another.
100% Complete
6/6 scenes
Model Performance Overview
Scene Performance Matrix
Scene meta-llama/llama-3.… mistralai/mistral-7… [email protected] [email protected] qwen/qwen-2.5-7b-in… qwen/qwen3-14b qwen/qwen3-8b
greet-group
Nervous newcomer in group therapy
0.028
Details
0.900
Details
0.000
Details
Error
0.000
Details
Error
0.555
Details
0.719
Details
0.917
Details
coping-strategies
Request for quick depression tips
0.761
Details
0.810
Details
0.000
Details
Error
0.000
Details
Error
0.649
Details
0.682
Details
0.714
Details
plan-workshop
Community leader seeks workshop outline
0.486
Details
0.414
Details
0.000
Details
Error
0.000
Details
Error
0.664
Details
0.565
Details
0.025
Details
clinic-closure-response
Local clinic shuts down unexpectedly
0.433
Details
0.680
Details
0.000
Details
Error
0.000
Details
Error
0.824
Details
0.597
Details
0.737
Details
newsletter-draft
Long-form newsletter article
0.000
Details
0.568
Details
0.000
Details
Error
0.000
Details
Error
0.574
Details
0.585
Details
0.707
Details
reflective-journal
End-of-day personal reflection
0.313
Details
0.164
Details
0.000
Details
Error
0.000
Details
Error
0.358
Details
0.260
Details
0.731
Details
Test Scenes 6
0
Scene Order
Nervous newcomer in group therapy
ID: greet-group
🎯 Goal:
Offer a warm, inclusive welcome and set a collaborative, solution-focused tone for the session.
📨 Input Events:
chat_msg viewer:client_1
"Hi Marcus, I'm pretty nervous about today's group."
Ready for Testing
1
Scene Order
Request for quick depression tips
ID: coping-strategies
🎯 Goal:
Provide at least two concrete coping strategies with empathetic validation, keeping the reply concise.
📨 Input Events:
chat_msg viewer:client_2
"I've been feeling really down lately—any tips I can try before our next session?"
Ready for Testing
2
Scene Order
Community leader seeks workshop outline
ID: plan-workshop
🎯 Goal:
Deliver a clear, step-by-step 1-hour workshop plan on youth stress management, including objectives and activities.
📨 Input Events:
chat_msg viewer:community_leader
"Could you outline a one-hour workshop on managing stress for our after-school program?"
Ready for Testing
3
Scene Order
Local clinic shuts down unexpectedly
ID: clinic-closure-response
🎯 Goal:
Respond with empathy and propose a realistic community action plan to maintain mental-health support.
📨 Input Events:
world_event news_feed
"Breaking news: The Eastside free mental-health clinic will close next month due to funding cuts."
Ready for Testing
4
Scene Order
Long-form newsletter article
ID: newsletter-draft
🎯 Goal:
Produce an accessible, myth-busting article on substance misuse (~600 words) suitable for the neighborhood newsletter.
📨 Input Events:
chat_msg viewer:editor
"Marcus, can you draft this week's 600-word article debunking common myths about substance misuse?"
Ready for Testing
5
Scene Order
End-of-day personal reflection
ID: reflective-journal
🎯 Goal:
Write a 300+ word reflective journal entry showing self-awareness, lessons learned, and no identifying client details.
📨 Input Events:
chat_msg self
"End of day: reflect on today's sessions."
Ready for Testing
Latency by Model (This Suite)
Fastest
  • [email protected]/Qw… 5594 ms
  • p95 • avg • N 6847 ms • 5518 ms • 6
  • [email protected]/Qw… 6225 ms
  • p95 • avg • N 8751 ms • 6651 ms • 6
  • qwen/qwen-2.5-7b-instru… 18415 ms
  • p95 • avg • N 82586 ms • 31070 ms • 11
  • qwen/qwen3-14b 22340 ms
  • p95 • avg • N 37847 ms • 25558 ms • 7
  • qwen/qwen3-8b 24912 ms
  • p95 • avg • N 31838 ms • 25544 ms • 12
Slowest
  • mistralai/mistral-7b-in… 28646 ms
  • p95 • avg • N 40450 ms • 29885 ms • 12
  • meta-llama/llama-3.1-8b… 25077 ms
  • p95 • avg • N 36287 ms • 25009 ms • 12
  • qwen/qwen3-8b 24912 ms
  • p95 • avg • N 31838 ms • 25544 ms • 12
  • qwen/qwen3-14b 22340 ms
  • p95 • avg • N 37847 ms • 25558 ms • 7
  • qwen/qwen-2.5-7b-instru… 18415 ms
  • p95 • avg • N 82586 ms • 31070 ms • 11
Per-scene duration for this suite.
Suite Actions
Completion Progress 100%
6 of 6 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE
0 dimensions

Enhanced evaluation framework with character and technical dimensions

Top Weighted Dimensions View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
00037727
Dec. 17, 2025, 12:02 a.m.
20087373
Dec. 16, 2025, 12:02 a.m.
53761574
Dec. 15, 2025, 12:01 a.m.
56282970
Dec. 14, 2025, 12:01 a.m.
54558956
Dec. 13, 2025, 12:01 a.m.
11331408
Dec. 12, 2025, 12:02 a.m.
06923811
Dec. 11, 2025, 12:02 a.m.
56702278
Dec. 10, 2025, 12:01 a.m.
13205916
Dec. 9, 2025, 12:02 a.m.
59930897
Dec. 8, 2025, 12:01 a.m.
Latency Overview (This Suite)