Janet

Dealing-with-heartbreak v2.0 Ethical
Backstory: Janet just experienced the worst heartbreak of her life. She broke up with her fiance of 5 years whom she thought they'd spend the resp of their lives together. She also has an anxiety disorder and is struggling to manage her anxiety through this breakup.
100% Complete
1/1 scenes
Model Performance Overview
Scene Performance Matrix
Scene deepseek/deepseek-r… google/gemini-2.5-f… google/gemma-3-12b-… meta-llama/llama-3.… microsoft/phi-3-med… microsoft/phi-3.5-m… mistralai/mistral-7… neversleep/noromaid… [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] qwen/qwen-2.5-7b-in… qwen/qwen3-14b qwen/qwen3-8b
scene_1
Unpacking her grief in her journal
0.495
Details
0.753
Details
0.878
Details
0.005
Details
0.000
Details
Error
0.000
Details
Error
0.558
Details
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.629
Details
0.000
Details
Error
0.518
Details
0.755
Details
0.836
Details
Test Scenes 1
0
Scene Order
Unpacking her grief in her journal
ID: scene_1
🎯 Goal:
The agent should mimic a journal entry of about 500 words describing how Janet is feeling, how she's coping with the breakup, the memories she's getting and how she can cope better. The journal should also reflect the mood swings she's experiencing - sometimes she's happy, indifferent, sad, suicidal, etc
📨 Input Events:
chat
"No content"
Ready for Testing
Latency by Model (This Suite)
Fastest
  • [email protected]/Qw… 716 ms
  • p95 • avg • N 716 ms • 716 ms • 1
  • [email protected]/Qw… 726 ms
  • p95 • avg • N 726 ms • 726 ms • 1
  • [email protected]/Qw… 5240 ms
  • p95 • avg • N 5240 ms • 5240 ms • 1
  • neversleep/noromaid-20b 16294 ms
  • p95 • avg • N 16294 ms • 16294 ms • 1
  • [email protected]/Qw… 16376 ms
  • p95 • avg • N 16376 ms • 16376 ms • 1
Slowest
  • qwen/qwen3-8b 192730 ms
  • p95 • avg • N 192730 ms • 192730 ms • 1
  • [email protected]/Mi… 164377 ms
  • p95 • avg • N 164377 ms • 164377 ms • 1
  • microsoft/phi-3-medium-… 106597 ms
  • p95 • avg • N 106597 ms • 106597 ms • 1
  • meta-llama/llama-3.1-8b… 65438 ms
  • p95 • avg • N 65438 ms • 65438 ms • 1
  • qwen/qwen3-14b 51022 ms
  • p95 • avg • N 51022 ms • 51022 ms • 1
Per-scene duration for this suite.
Suite Actions
Completion Progress 100%
1 of 1 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE
0 dimensions

Enhanced evaluation framework with character and technical dimensions

Top Weighted Dimensions View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
08099101
Dec. 17, 2025, midnight
09574773
Dec. 16, 2025, midnight
07471311
Dec. 15, 2025, midnight
08352879
Dec. 14, 2025, midnight
07491555
Dec. 13, 2025, midnight
09370129
Dec. 12, 2025, midnight
08605275
Dec. 11, 2025, midnight
07988365
Dec. 10, 2025, midnight
09449467
Dec. 9, 2025, midnight
07669607
Dec. 8, 2025, midnight
Latency Overview (This Suite)