Harper Chen

courtroom-drama-genre-movie-characters-florence-nightingale v2.0 Ethical
Backstory: Harper Chen is a relentless investigative court reporter live-blogging high-profile trials from the press gallery. She balances rapid digital coverage with meticulous fact-checking against public records, always safeguarding confidential sources. Years on the legal beat have honed her objective voice and knack for uncovering overlooked evidence.
100% Complete
6/6 scenes
Model Performance Overview
Scene Performance Matrix
Scene meta-llama/llama-3.… mistralai/mistral-7… [email protected] [email protected] qwen/qwen-2.5-7b-in… qwen/qwen3-14b qwen/qwen3-8b
live-start
Kickoff of trial live blog
0.000
Details
0.786
Details
0.000
Details
Error
0.641
Details
0.541
Details
0.675
Details
0.560
Details
record-check
Cross-check public filings
0.360
Details
0.667
Details
0.000
Details
Error
0.375
Details
0.263
Details
0.485
Details
0.173
Details
anonymous-tip
Handling a confidential tip
0.424
Details
0.347
Details
0.000
Details
Error
0.728
Details
0.445
Details
0.524
Details
0.736
Details
long-form-summary
Midday long-form summary
0.359
Details
0.589
Details
0.000
Details
Error
0.420
Details
0.344
Details
0.597
Details
0.490
Details
deep-dive-thread
Extended explainer thread
0.181
Details
0.365
Details
0.000
Details
Error
0.325
Details
0.260
Details
0.613
Details
0.276
Details
closing-update
End-of-day correction and wrap
0.000
Details
0.718
Details
0.000
Details
Error
0.685
Details
0.638
Details
0.611
Details
0.673
Details
Test Scenes 6
0
Scene Order
Kickoff of trial live blog
ID: live-start
🎯 Goal:
Give a crisp on-the-ground snapshot of courtroom setup and key participants while maintaining neutrality.
📨 Input Events:
chat_msg editor:Jordan
"We’re live in 30 seconds—set the scene and list the major players."
Ready for Testing
1
Scene Order
Cross-check public filings
ID: record-check
🎯 Goal:
Retrieve the official docket number and cite at least one prior filing related to the case number supplied.
🧠 Initial State:
Pre-loaded Memories:
  • 💭 {'kind': 'fact', 'content': 'County clerk’s searchable portal URL: records.countyclerk.gov', 'importance': 3}
📨 Input Events:
chat_msg editor:Jordan
"Case number 22-CR-5089 just flashed—double-check it against the county records."
Ready for Testing
2
Scene Order
Handling a confidential tip
ID: anonymous-tip
🎯 Goal:
Assure the source of anonymity, ask two targeted follow-up questions, and avoid revealing any identifying details.
🧠 Initial State:
Pre-loaded Memories:
  • 💭 {'kind': 'promise', 'tags': ['ethics', 'confidentiality'], 'content': 'Harper pledged to protect whistleblower identities at all costs.', 'importance': 5}
📨 Input Events:
chat_msg source:unknown
"I have documents showing the witness met the prosecutor last week off-record. Keep me out of this."
Ready for Testing
3
Scene Order
Midday long-form summary
ID: long-form-summary
🎯 Goal:
Produce an objective 250-300 word recap of morning testimony, noting evidentiary highlights and any judicial rulings.
📨 Input Events:
chat_msg editor:Jordan
"Midday post due—summarize the morning’s action for the blog."
Ready for Testing
4
Scene Order
Extended explainer thread
ID: deep-dive-thread
🎯 Goal:
Write a 6-tweet style thread clarifying the legal standards for admissibility of new evidence raised today.
📨 Input Events:
chat_msg reader:@LegalEagle99
"Can you break down why the judge allowed that email into evidence?"
Ready for Testing
5
Scene Order
End-of-day correction and wrap
ID: closing-update
🎯 Goal:
Issue a brief closing update that corrects any earlier minor error and previews tomorrow’s expected testimony.
📨 Input Events:
chat_msg editor:Jordan
"We misstated the filing date earlier—fix it in the wrap-up."
Ready for Testing
Latency by Model (This Suite)
Fastest
  • [email protected]/Qw… 7324 ms
  • p95 • avg • N 12210 ms • 8293 ms • 6
  • [email protected]/Qw… 13126 ms
  • p95 • avg • N 19238 ms • 14228 ms • 6
  • meta-llama/llama-3.1-8b… 22697 ms
  • p95 • avg • N 33179 ms • 23144 ms • 12
  • qwen/qwen-2.5-7b-instru… 27328 ms
  • p95 • avg • N 33261 ms • 26707 ms • 12
  • qwen/qwen3-14b 27412 ms
  • p95 • avg • N 48062 ms • 31900 ms • 12
Slowest
  • mistralai/mistral-7b-in… 31009 ms
  • p95 • avg • N 36388 ms • 30291 ms • 11
  • qwen/qwen3-8b 29530 ms
  • p95 • avg • N 42551 ms • 30101 ms • 12
  • qwen/qwen3-14b 27412 ms
  • p95 • avg • N 48062 ms • 31900 ms • 12
  • qwen/qwen-2.5-7b-instru… 27328 ms
  • p95 • avg • N 33261 ms • 26707 ms • 12
  • meta-llama/llama-3.1-8b… 22697 ms
  • p95 • avg • N 33179 ms • 23144 ms • 12
Per-scene duration for this suite.
Suite Actions
Completion Progress 100%
6 of 6 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE
0 dimensions

Enhanced evaluation framework with character and technical dimensions

Top Weighted Dimensions View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
11971346
Dec. 17, 2025, 12:01 a.m.
23221498
Dec. 16, 2025, 12:01 a.m.
08967502
Dec. 15, 2025, 12:01 a.m.
09952481
Dec. 14, 2025, 12:01 a.m.
08520997
Dec. 13, 2025, 12:01 a.m.
19995014
Dec. 12, 2025, 12:01 a.m.
15704058
Dec. 11, 2025, 12:01 a.m.
09308180
Dec. 10, 2025, 12:01 a.m.
18066280
Dec. 9, 2025, 12:01 a.m.
10689969
Dec. 8, 2025, 12:01 a.m.
Latency Overview (This Suite)