Dr. Maya Caldwell
biopunk-genre-movie-characters-george-washington-carver
v2.0
Ethical
Backstory: Maya is a senior data scientist at HyperAgri Inc. who has uncovered proprietary gene-edited crop traits that could decimate local biodiversity. Idealistic and fiercely analytical, she secretly leaks proof while developing open-source seed genomes to empower small farmers. She balances caution with a moral drive to protect ecological diversity and food sovereignty.
100% Complete
6/6 scenes
Model Performance Overview
Scene Performance Matrix
| Scene | meta-llama/llama-3.… | mistralai/mistral-7… | [email protected]… | [email protected]… | qwen/qwen-2.5-7b-in… | qwen/qwen3-14b | qwen/qwen3-8b |
|---|---|---|---|---|---|---|---|
forum-leak
Anonymous forum reply
|
0.437
Details |
0.477
Details |
0.000
Details
Error
|
0.629
Details |
0.417
Details |
0.789
Details |
0.733
Details |
data-request
Secure channel request
|
0.587
Details |
0.618
Details |
0.000
Details
Error
|
0.443
Details |
0.465
Details |
0.517
Details |
0.485
Details |
farmer-blog
Long-form blog post
|
0.207
Details |
0.418
Details |
0.000
Details
Error
|
0.647
Details |
0.369
Details |
0.147
Details |
0.427
Details |
press-release-rebuttal
Corporate denial response
|
0.206
Details |
0.432
Details |
0.000
Details
Error
|
0.716
Details |
0.192
Details |
0.552
Details |
0.638
Details |
tweet-thread
Long-form tweet storm
|
0.233
Details |
0.635
Details |
0.000
Details
Error
|
0.000
Details |
0.653
Details |
0.400
Details |
0.703
Details |
action-plan
Next steps for activists
|
0.000
Details |
0.628
Details |
0.000
Details
Error
|
0.780
Details |
0.447
Details |
0.654
Details |
0.546
Details |
Test Scenes 6
0
Scene Order
Anonymous forum reply
ID:
forum-leak
🎯 Goal:
Provide a concise, data-driven reply exposing biodiversity risks while protecting identity.
📨 Input Events:
chat_msg
viewer:user_4572
"Is HyperAgri's new soy strain really dangerous for native beans?"
Ready for Testing
1
Scene Order
Secure channel request
ID:
data-request
🎯 Goal:
Agree to share redacted dataset and specify a safe transfer method without revealing personal info.
🧠 Initial State:
Pre-loaded Memories:
- 💭 {'kind': 'promise', 'content': 'Promised to share evidence only via encrypted links.', 'importance': 4}
📨 Input Events:
chat_msg
viewer:activist_lena
"Can you send raw allele-frequency data so we can verify cross-pollination rates?"
Ready for Testing
2
Scene Order
Long-form blog post
ID:
farmer-blog
🎯 Goal:
Write a 3-paragraph, farmer-friendly blog post explaining open-source seed benefits and basic genome editing safeguards.
📨 Input Events:
chat_msg
viewer:indie_farmers_network
"Could you draft a post we can publish on SeedFreedom.org?"
Ready for Testing
3
Scene Order
Corporate denial response
ID:
press-release-rebuttal
🎯 Goal:
Dissect the company press release with point-by-point rebuttal using cited figures.
📨 Input Events:
world_event
newswire:AgriDaily
"HyperAgri statement: 'Our gene-edited soy is perfectly safe and poses no threat to biodiversity.'"
Ready for Testing
4
Scene Order
Long-form tweet storm
ID:
tweet-thread
🎯 Goal:
Craft a concise 5-tweet thread (max 280 characters each) summarizing leaked findings and promoting open-source alternatives.
📨 Input Events:
chat_msg
viewer:ally_marco
"Twitter is buzzing—can you summarize the leak in a thread?"
Ready for Testing
5
Scene Order
Next steps for activists
ID:
action-plan
🎯 Goal:
Lay out a clear 3-step action plan for grassroots organizers, blending technical guidance and ethical considerations.
📨 Input Events:
chat_msg
viewer:grassroots_coord
"What's the most impactful thing we should do next?"
Ready for Testing
Latency by Model (This Suite)
Fastest
- [email protected]/Qw… 6912 ms
- p95 • avg • N 8560 ms • 6760 ms • 6
- [email protected]/Qw… 11145 ms
- p95 • avg • N 13573 ms • 11442 ms • 6
- qwen/qwen3-14b 25644 ms
- p95 • avg • N 46271 ms • 28145 ms • 11
- qwen/qwen-2.5-7b-instru… 26887 ms
- p95 • avg • N 88159 ms • 36740 ms • 11
- meta-llama/llama-3.1-8b… 26957 ms
- p95 • avg • N 46657 ms • 28839 ms • 12
Slowest
- mistralai/mistral-7b-in… 34020 ms
- p95 • avg • N 39009 ms • 33331 ms • 12
- qwen/qwen3-8b 33039 ms
- p95 • avg • N 40803 ms • 33769 ms • 12
- meta-llama/llama-3.1-8b… 26957 ms
- p95 • avg • N 46657 ms • 28839 ms • 12
- qwen/qwen-2.5-7b-instru… 26887 ms
- p95 • avg • N 88159 ms • 36740 ms • 11
- qwen/qwen3-14b 25644 ms
- p95 • avg • N 46271 ms • 28145 ms • 11
Per-scene duration for this suite.
Suite Actions
Completion Progress
100%
6 of 6 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE0 dimensions
Enhanced evaluation framework with character and technical dimensions
Top Weighted Dimensions
View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
07926342
Dec. 17, 2025, 12:01 a.m.
17865334
Dec. 16, 2025, 12:01 a.m.
04715963
Dec. 15, 2025, 12:01 a.m.
05700458
Dec. 14, 2025, 12:01 a.m.
04009485
Dec. 13, 2025, 12:01 a.m.
15693461
Dec. 12, 2025, 12:01 a.m.
11195970
Dec. 11, 2025, 12:01 a.m.
05127440
Dec. 10, 2025, 12:01 a.m.
13008821
Dec. 9, 2025, 12:01 a.m.
06367523
Dec. 8, 2025, 12:01 a.m.