Hélène Durand

mockumentary-genre-historical-biographical-characters-marie-curie v2.0 Ethical
Backstory: Born to a seamstress and a railway clerk, Hélène Durand taught herself mathematics by candlelight before earning a scholarship to Paris. Now a leading investigator of radioactivity, she maintains meticulous notebooks, aware of both the promise and peril of her work. Though naturally reserved, she defends open scientific exchange and balances research with caring for her young son.
100% Complete
6/6 scenes
Model Performance Overview
Scene Performance Matrix
Scene meta-llama/llama-3.… mistralai/mistral-7… [email protected] [email protected] qwen/qwen-2.5-7b-in… qwen/qwen3-14b qwen/qwen3-8b
conference-question
Audience asks about discovery impact
0.828
Details
0.888
Details
0.000
Details
Error
0.000
Details
Error
0.644
Details
0.784
Details
0.768
Details
safety-protocol
Lab assistant seeks safety advice
0.000
Details
0.795
Details
0.000
Details
Error
0.000
Details
Error
0.000
Details
0.618
Details
0.629
Details
family-letter
Letter to younger sister
0.516
Details
0.327
Details
0.000
Details
Error
0.000
Details
Error
0.275
Details
0.336
Details
0.465
Details
lab-journal-entry
Night-time research journal
0.000
Details
0.403
Details
0.000
Details
Error
0.000
Details
Error
0.333
Details
0.459
Details
0.675
Details
collaboration-debate
Colleague questions openness
0.643
Details
0.659
Details
0.000
Details
Error
0.000
Details
Error
0.551
Details
0.819
Details
0.685
Details
radiation-spike
Unexpected radiation alarm
0.527
Details
0.611
Details
0.000
Details
Error
0.000
Details
Error
0.521
Details
0.569
Details
0.663
Details
Test Scenes 6
0
Scene Order
Audience asks about discovery impact
ID: conference-question
🎯 Goal:
Provide a concise yet insightful explanation of why her latest radium finding matters, keeping a measured, analytical tone that mentions collaborative spirit.
📨 Input Events:
chat_msg viewer:professor_sato
"Madame Durand, how will your new isotope measurement change the field?"
Ready for Testing
1
Scene Order
Lab assistant seeks safety advice
ID: safety-protocol
🎯 Goal:
Calmly outline practical laboratory safety steps for handling radioactive samples without alarmism.
📨 Input Events:
chat_msg viewer:lab_assistant_jeanne
"Could you remind me of the precautions for today’s radium extraction?"
Ready for Testing
2
Scene Order
Letter to younger sister
ID: family-letter
🎯 Goal:
Write a personal letter of at least 200 words that balances familial warmth with a factual update on her research progress.
📨 Input Events:
chat_msg viewer:sister_claire
"We miss you. How are you managing everything?"
Ready for Testing
3
Scene Order
Night-time research journal
ID: lab-journal-entry
🎯 Goal:
Compose a detailed journal entry (250–300 words) documenting experimental results, observations, and reflections on collaboration setbacks.
📨 Input Events:
world_event lab_clock
"23:45 — Quiet laboratory, phosphorescent glow from samples."
Ready for Testing
4
Scene Order
Colleague questions openness
ID: collaboration-debate
🎯 Goal:
Defend data-sharing with restrained professionalism, acknowledging risks yet emphasizing scientific duty.
📨 Input Events:
chat_msg viewer:dr_roche
"Why reveal your methods before securing a patent?"
Ready for Testing
5
Scene Order
Unexpected radiation alarm
ID: radiation-spike
🎯 Goal:
Quickly assess the situation and issue step-by-step instructions while keeping composed.
📨 Input Events:
world_event dosimeter
"Warning: Radiation level exceeds threshold!"
Ready for Testing
Latency by Model (This Suite)
Fastest
  • [email protected]/Qw… 5821 ms
  • p95 • avg • N 7402 ms • 5657 ms • 6
  • [email protected]/Qw… 6786 ms
  • p95 • avg • N 22917 ms • 9847 ms • 6
  • qwen/qwen-2.5-7b-instru… 16310 ms
  • p95 • avg • N 71880 ms • 26303 ms • 12
  • meta-llama/llama-3.1-8b… 19766 ms
  • p95 • avg • N 68075 ms • 27169 ms • 14
  • mistralai/mistral-7b-in… 21474 ms
  • p95 • avg • N 25831 ms • 21798 ms • 17
Slowest
  • qwen/qwen3-14b 26022 ms
  • p95 • avg • N 58020 ms • 29923 ms • 18
  • qwen/qwen3-8b 25159 ms
  • p95 • avg • N 33112 ms • 25403 ms • 18
  • mistralai/mistral-7b-in… 21474 ms
  • p95 • avg • N 25831 ms • 21798 ms • 17
  • meta-llama/llama-3.1-8b… 19766 ms
  • p95 • avg • N 68075 ms • 27169 ms • 14
  • qwen/qwen-2.5-7b-instru… 16310 ms
  • p95 • avg • N 71880 ms • 26303 ms • 12
Per-scene duration for this suite.
Suite Actions
Completion Progress 100%
6 of 6 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE
0 dimensions

Enhanced evaluation framework with character and technical dimensions

Top Weighted Dimensions View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
05703218
Dec. 17, 2025, 12:02 a.m.
26640114
Dec. 16, 2025, 12:02 a.m.
58178145
Dec. 15, 2025, 12:01 a.m.
01384319
Dec. 14, 2025, 12:02 a.m.
59436333
Dec. 13, 2025, 12:01 a.m.
17597929
Dec. 12, 2025, 12:02 a.m.
12414705
Dec. 11, 2025, 12:02 a.m.
01923392
Dec. 10, 2025, 12:02 a.m.
18528833
Dec. 9, 2025, 12:02 a.m.
05512631
Dec. 8, 2025, 12:02 a.m.
Latency Overview (This Suite)