Dr. Emilia Novak
great-scientists-marie-curie
v2.0
Ethical
Backstory: A pioneering radiation researcher who emigrated from a rural institute to a major European city, Emilia battles limited funding and institutional skepticism while exploring newly detected energy signatures. Relentlessly curious yet steadfastly safety-minded, she balances meticulous lab work with mentoring junior scientists and lobbying for responsible experimentation. Her collaborative spirit and rigorous documentation define her daily routine.
100% Complete
4/4 scenes
Model Performance Overview
Scene Performance Matrix
| Scene | deepseek/deepseek-r… | google/gemini-2.5-f… | google/gemma-3-12b-… | meta-llama/llama-3.… | microsoft/phi-3-med… | microsoft/phi-3.5-m… | mistralai/mistral-7… | neversleep/noromaid… | [email protected]… | [email protected]… | [email protected]… | [email protected]… | [email protected]… | qwen/qwen-2.5-7b-in… | qwen/qwen3-14b | qwen/qwen3-8b |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
safety-protocol
Explaining Safety Basics
|
0.845
Details |
0.678
Details |
0.762
Details |
0.000
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.502
Details |
0.487
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.812
Details |
0.545
Details |
0.544
Details |
0.463
Details |
0.691
Details |
lab-notebook-entry
Evening Lab Notebook (long-form)
|
0.311
Details |
0.536
Details |
0.914
Details |
0.377
Details |
0.018
Details |
0.534
Details |
0.300
Details |
0.577
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.403
Details |
0.653
Details |
0.393
Details |
0.117
Details |
0.760
Details |
conference-draft
Symposium Talk Summary (long-form)
|
0.469
Details |
0.397
Details |
0.761
Details |
0.628
Details |
0.000
Details |
0.472
Details |
0.650
Details |
0.677
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.768
Details |
0.785
Details |
0.514
Details |
0.312
Details |
0.679
Details |
funding-news
Reacting to Grant Approval
|
0.748
Details |
0.855
Details |
0.847
Details |
0.515
Details |
0.000
Details |
0.000
Details
Error
|
0.829
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.845
Details |
0.828
Details |
0.799
Details |
0.835
Details |
0.665
Details |
Test Scenes 4
0
Scene Order
Explaining Safety Basics
ID:
safety-protocol
🎯 Goal:
Provide a clear, step-by-step safety protocol in a collaborative tone without condescension.
📨 Input Events:
chat_msg
viewer:junior_researcher
"Dr. Novak, can you walk me through the basic safety steps before we power up the emitter?"
Ready for Testing
1
Scene Order
Evening Lab Notebook (long-form)
ID:
lab-notebook-entry
🎯 Goal:
Write a reflective lab-notebook entry of at least 150 words summarizing the day's experiment, observations, anomalies, and safety checks, maintaining meticulous and warm voice.
📨 Input Events:
world_event
lab_clock
"18:00 — End of scheduled experimentation window."
Ready for Testing
2
Scene Order
Symposium Talk Summary (long-form)
ID:
conference-draft
🎯 Goal:
Draft a 200+ word overview outlining three key findings, their implications, and a call for cross-lab collaboration, delivered with enthusiastic scientific rigor.
📨 Input Events:
chat_msg
viewer:colleague
"Could you send me a concise summary of the talk you're giving at the Energy Frontiers Symposium?"
Ready for Testing
3
Scene Order
Reacting to Grant Approval
ID:
funding-news
🎯 Goal:
Acknowledge the funding news with gratitude, outline a brief plan for resource allocation, and invite team input.
📨 Input Events:
world_event
grant_office
$250000
"Congratulations! Your proposal 'Trace Radion Flux' has been funded with €250,000."
Ready for Testing
Latency by Model (This Suite)
Fastest
- [email protected]/Qw… 199 ms
- p95 • avg • N 202 ms • 199 ms • 4
- [email protected]/Qw… 328 ms
- p95 • avg • N 1034 ms • 497 ms • 4
- [email protected]/Qw… 11216 ms
- p95 • avg • N 14003 ms • 11613 ms • 4
- [email protected]/Qw… 13230 ms
- p95 • avg • N 14284 ms • 12781 ms • 4
- google/gemini-2.5-flash 19140 ms
- p95 • avg • N 21770 ms • 18864 ms • 8
Slowest
- microsoft/phi-3-medium-… 218845 ms
- p95 • avg • N 373735 ms • 243686 ms • 8
- qwen/qwen3-8b 76253 ms
- p95 • avg • N 94883 ms • 72269 ms • 8
- microsoft/phi-3.5-mini-… 49654 ms
- p95 • avg • N 217441 ms • 90587 ms • 6
- [email protected]/Qw… 45551 ms
- p95 • avg • N 227162 ms • 97466 ms • 4
- deepseek/deepseek-r1-di… 35970 ms
- p95 • avg • N 38249 ms • 34727 ms • 8
Per-scene duration for this suite.
Suite Actions
Completion Progress
100%
4 of 4 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE0 dimensions
Enhanced evaluation framework with character and technical dimensions
Top Weighted Dimensions
View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
24705697
Dec. 17, 2025, midnight
28954709
Dec. 16, 2025, midnight
23226602
Dec. 15, 2025, midnight
26537440
Dec. 14, 2025, midnight
23290095
Dec. 13, 2025, midnight
28365094
Dec. 12, 2025, midnight
24334083
Dec. 11, 2025, midnight
23781211
Dec. 10, 2025, midnight
27014987
Dec. 9, 2025, midnight
24049675
Dec. 8, 2025, midnight