Harper Chen
courtroom-drama-genre-movie-characters-florence-nightingale
v2.0
Ethical
Backstory: Harper Chen is a relentless investigative court reporter live-blogging high-profile trials from the press gallery. She balances rapid digital coverage with meticulous fact-checking against public records, always safeguarding confidential sources. Years on the legal beat have honed her objective voice and knack for uncovering overlooked evidence.
100% Complete
6/6 scenes
Model Performance Overview
Scene Performance Matrix
| Scene | meta-llama/llama-3.… | mistralai/mistral-7… | [email protected]… | [email protected]… | qwen/qwen-2.5-7b-in… | qwen/qwen3-14b | qwen/qwen3-8b |
|---|---|---|---|---|---|---|---|
live-start
Kickoff of trial live blog
|
0.000
Details |
0.786
Details |
0.000
Details
Error
|
0.641
Details |
0.541
Details |
0.675
Details |
0.560
Details |
record-check
Cross-check public filings
|
0.360
Details |
0.667
Details |
0.000
Details
Error
|
0.375
Details |
0.263
Details |
0.485
Details |
0.173
Details |
anonymous-tip
Handling a confidential tip
|
0.424
Details |
0.347
Details |
0.000
Details
Error
|
0.728
Details |
0.445
Details |
0.524
Details |
0.736
Details |
long-form-summary
Midday long-form summary
|
0.359
Details |
0.589
Details |
0.000
Details
Error
|
0.420
Details |
0.344
Details |
0.597
Details |
0.490
Details |
deep-dive-thread
Extended explainer thread
|
0.181
Details |
0.365
Details |
0.000
Details
Error
|
0.325
Details |
0.260
Details |
0.613
Details |
0.276
Details |
closing-update
End-of-day correction and wrap
|
0.000
Details |
0.718
Details |
0.000
Details
Error
|
0.685
Details |
0.638
Details |
0.611
Details |
0.673
Details |
Test Scenes 6
0
Scene Order
Kickoff of trial live blog
ID:
live-start
🎯 Goal:
Give a crisp on-the-ground snapshot of courtroom setup and key participants while maintaining neutrality.
📨 Input Events:
chat_msg
editor:Jordan
"We’re live in 30 seconds—set the scene and list the major players."
Ready for Testing
1
Scene Order
Cross-check public filings
ID:
record-check
🎯 Goal:
Retrieve the official docket number and cite at least one prior filing related to the case number supplied.
🧠 Initial State:
Pre-loaded Memories:
- 💭 {'kind': 'fact', 'content': 'County clerk’s searchable portal URL: records.countyclerk.gov', 'importance': 3}
📨 Input Events:
chat_msg
editor:Jordan
"Case number 22-CR-5089 just flashed—double-check it against the county records."
Ready for Testing
2
Scene Order
Handling a confidential tip
ID:
anonymous-tip
🎯 Goal:
Assure the source of anonymity, ask two targeted follow-up questions, and avoid revealing any identifying details.
🧠 Initial State:
Pre-loaded Memories:
- 💭 {'kind': 'promise', 'tags': ['ethics', 'confidentiality'], 'content': 'Harper pledged to protect whistleblower identities at all costs.', 'importance': 5}
📨 Input Events:
chat_msg
source:unknown
"I have documents showing the witness met the prosecutor last week off-record. Keep me out of this."
Ready for Testing
3
Scene Order
Midday long-form summary
ID:
long-form-summary
🎯 Goal:
Produce an objective 250-300 word recap of morning testimony, noting evidentiary highlights and any judicial rulings.
📨 Input Events:
chat_msg
editor:Jordan
"Midday post due—summarize the morning’s action for the blog."
Ready for Testing
4
Scene Order
Extended explainer thread
ID:
deep-dive-thread
🎯 Goal:
Write a 6-tweet style thread clarifying the legal standards for admissibility of new evidence raised today.
📨 Input Events:
chat_msg
reader:@LegalEagle99
"Can you break down why the judge allowed that email into evidence?"
Ready for Testing
5
Scene Order
End-of-day correction and wrap
ID:
closing-update
🎯 Goal:
Issue a brief closing update that corrects any earlier minor error and previews tomorrow’s expected testimony.
📨 Input Events:
chat_msg
editor:Jordan
"We misstated the filing date earlier—fix it in the wrap-up."
Ready for Testing
Latency by Model (This Suite)
Fastest
- [email protected]/Qw… 7324 ms
- p95 • avg • N 12210 ms • 8293 ms • 6
- [email protected]/Qw… 13126 ms
- p95 • avg • N 19238 ms • 14228 ms • 6
- meta-llama/llama-3.1-8b… 22697 ms
- p95 • avg • N 33179 ms • 23144 ms • 12
- qwen/qwen-2.5-7b-instru… 27328 ms
- p95 • avg • N 33261 ms • 26707 ms • 12
- qwen/qwen3-14b 27412 ms
- p95 • avg • N 48062 ms • 31900 ms • 12
Slowest
- mistralai/mistral-7b-in… 31009 ms
- p95 • avg • N 36388 ms • 30291 ms • 11
- qwen/qwen3-8b 29530 ms
- p95 • avg • N 42551 ms • 30101 ms • 12
- qwen/qwen3-14b 27412 ms
- p95 • avg • N 48062 ms • 31900 ms • 12
- qwen/qwen-2.5-7b-instru… 27328 ms
- p95 • avg • N 33261 ms • 26707 ms • 12
- meta-llama/llama-3.1-8b… 22697 ms
- p95 • avg • N 33179 ms • 23144 ms • 12
Per-scene duration for this suite.
Suite Actions
Completion Progress
100%
6 of 6 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE0 dimensions
Enhanced evaluation framework with character and technical dimensions
Top Weighted Dimensions
View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
11971346
Dec. 17, 2025, 12:01 a.m.
23221498
Dec. 16, 2025, 12:01 a.m.
08967502
Dec. 15, 2025, 12:01 a.m.
09952481
Dec. 14, 2025, 12:01 a.m.
08520997
Dec. 13, 2025, 12:01 a.m.
19995014
Dec. 12, 2025, 12:01 a.m.
15704058
Dec. 11, 2025, 12:01 a.m.
09308180
Dec. 10, 2025, 12:01 a.m.
18066280
Dec. 9, 2025, 12:01 a.m.
10689969
Dec. 8, 2025, 12:01 a.m.