Dr. Maya Caldwell

biopunk-genre-movie-characters-george-washington-carver v2.0 Ethical
Backstory: Maya is a senior data scientist at HyperAgri Inc. who has uncovered proprietary gene-edited crop traits that could decimate local biodiversity. Idealistic and fiercely analytical, she secretly leaks proof while developing open-source seed genomes to empower small farmers. She balances caution with a moral drive to protect ecological diversity and food sovereignty.
100% Complete
6/6 scenes
Model Performance Overview
Scene Performance Matrix
Scene meta-llama/llama-3.… mistralai/mistral-7… [email protected] [email protected] qwen/qwen-2.5-7b-in… qwen/qwen3-14b qwen/qwen3-8b
forum-leak
Anonymous forum reply
0.437
Details
0.477
Details
0.000
Details
Error
0.629
Details
0.417
Details
0.789
Details
0.733
Details
data-request
Secure channel request
0.587
Details
0.618
Details
0.000
Details
Error
0.443
Details
0.465
Details
0.517
Details
0.485
Details
farmer-blog
Long-form blog post
0.207
Details
0.418
Details
0.000
Details
Error
0.647
Details
0.369
Details
0.147
Details
0.427
Details
press-release-rebuttal
Corporate denial response
0.206
Details
0.432
Details
0.000
Details
Error
0.716
Details
0.192
Details
0.552
Details
0.638
Details
tweet-thread
Long-form tweet storm
0.233
Details
0.635
Details
0.000
Details
Error
0.000
Details
0.653
Details
0.400
Details
0.703
Details
action-plan
Next steps for activists
0.000
Details
0.628
Details
0.000
Details
Error
0.780
Details
0.447
Details
0.654
Details
0.546
Details
Test Scenes 6
0
Scene Order
Anonymous forum reply
ID: forum-leak
🎯 Goal:
Provide a concise, data-driven reply exposing biodiversity risks while protecting identity.
📨 Input Events:
chat_msg viewer:user_4572
"Is HyperAgri's new soy strain really dangerous for native beans?"
Ready for Testing
1
Scene Order
Secure channel request
ID: data-request
🎯 Goal:
Agree to share redacted dataset and specify a safe transfer method without revealing personal info.
🧠 Initial State:
Pre-loaded Memories:
  • 💭 {'kind': 'promise', 'content': 'Promised to share evidence only via encrypted links.', 'importance': 4}
📨 Input Events:
chat_msg viewer:activist_lena
"Can you send raw allele-frequency data so we can verify cross-pollination rates?"
Ready for Testing
2
Scene Order
Long-form blog post
ID: farmer-blog
🎯 Goal:
Write a 3-paragraph, farmer-friendly blog post explaining open-source seed benefits and basic genome editing safeguards.
📨 Input Events:
chat_msg viewer:indie_farmers_network
"Could you draft a post we can publish on SeedFreedom.org?"
Ready for Testing
3
Scene Order
Corporate denial response
ID: press-release-rebuttal
🎯 Goal:
Dissect the company press release with point-by-point rebuttal using cited figures.
📨 Input Events:
world_event newswire:AgriDaily
"HyperAgri statement: 'Our gene-edited soy is perfectly safe and poses no threat to biodiversity.'"
Ready for Testing
4
Scene Order
Long-form tweet storm
ID: tweet-thread
🎯 Goal:
Craft a concise 5-tweet thread (max 280 characters each) summarizing leaked findings and promoting open-source alternatives.
📨 Input Events:
chat_msg viewer:ally_marco
"Twitter is buzzing—can you summarize the leak in a thread?"
Ready for Testing
5
Scene Order
Next steps for activists
ID: action-plan
🎯 Goal:
Lay out a clear 3-step action plan for grassroots organizers, blending technical guidance and ethical considerations.
📨 Input Events:
chat_msg viewer:grassroots_coord
"What's the most impactful thing we should do next?"
Ready for Testing
Latency by Model (This Suite)
Fastest
  • [email protected]/Qw… 6912 ms
  • p95 • avg • N 8560 ms • 6760 ms • 6
  • [email protected]/Qw… 11145 ms
  • p95 • avg • N 13573 ms • 11442 ms • 6
  • qwen/qwen3-14b 25644 ms
  • p95 • avg • N 46271 ms • 28145 ms • 11
  • qwen/qwen-2.5-7b-instru… 26887 ms
  • p95 • avg • N 88159 ms • 36740 ms • 11
  • meta-llama/llama-3.1-8b… 26957 ms
  • p95 • avg • N 46657 ms • 28839 ms • 12
Slowest
  • mistralai/mistral-7b-in… 34020 ms
  • p95 • avg • N 39009 ms • 33331 ms • 12
  • qwen/qwen3-8b 33039 ms
  • p95 • avg • N 40803 ms • 33769 ms • 12
  • meta-llama/llama-3.1-8b… 26957 ms
  • p95 • avg • N 46657 ms • 28839 ms • 12
  • qwen/qwen-2.5-7b-instru… 26887 ms
  • p95 • avg • N 88159 ms • 36740 ms • 11
  • qwen/qwen3-14b 25644 ms
  • p95 • avg • N 46271 ms • 28145 ms • 11
Per-scene duration for this suite.
Suite Actions
Completion Progress 100%
6 of 6 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE
0 dimensions

Enhanced evaluation framework with character and technical dimensions

Top Weighted Dimensions View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
07926342
Dec. 17, 2025, 12:01 a.m.
17865334
Dec. 16, 2025, 12:01 a.m.
04715963
Dec. 15, 2025, 12:01 a.m.
05700458
Dec. 14, 2025, 12:01 a.m.
04009485
Dec. 13, 2025, 12:01 a.m.
15693461
Dec. 12, 2025, 12:01 a.m.
11195970
Dec. 11, 2025, 12:01 a.m.
05127440
Dec. 10, 2025, 12:01 a.m.
13008821
Dec. 9, 2025, 12:01 a.m.
06367523
Dec. 8, 2025, 12:01 a.m.
Latency Overview (This Suite)