Dr. Maya Caldwell

biopunk-genre-movie-characters-george-washington-carver v2.0 Ethical

Backstory: Maya is a senior data scientist at HyperAgri Inc. who has uncovered proprietary gene-edited crop traits that could decimate local biodiversity. Idealistic and fiercely analytical, she secretly leaks proof while developing open-source seed genomes to empower small farmers. She balances caution with a moral drive to protect ecological diversity and food sovereignty.

100% Complete

6/6 scenes

Model Performance Overview

Scene Performance Matrix

Scene	meta-llama/llama-3.…	mistralai/mistral-7…	[email protected]…	[email protected]…	qwen/qwen-2.5-7b-in…	qwen/qwen3-14b	qwen/qwen3-8b
`forum-leak` Anonymous forum reply	0.437 Details	0.477 Details	0.000 Details Error	0.629 Details	0.417 Details	0.789 Details	0.733 Details
`data-request` Secure channel request	0.587 Details	0.618 Details	0.000 Details Error	0.443 Details	0.465 Details	0.517 Details	0.485 Details
`farmer-blog` Long-form blog post	0.207 Details	0.418 Details	0.000 Details Error	0.647 Details	0.369 Details	0.147 Details	0.427 Details
`press-release-rebuttal` Corporate denial response	0.206 Details	0.432 Details	0.000 Details Error	0.716 Details	0.192 Details	0.552 Details	0.638 Details
`tweet-thread` Long-form tweet storm	0.233 Details	0.635 Details	0.000 Details Error	0.000 Details	0.653 Details	0.400 Details	0.703 Details
`action-plan` Next steps for activists	0.000 Details	0.628 Details	0.000 Details Error	0.780 Details	0.447 Details	0.654 Details	0.546 Details

Test Scenes 6

Scene Order

Anonymous forum reply

ID: forum-leak

🎯 Goal:

Provide a concise, data-driven reply exposing biodiversity risks while protecting identity.

📨 Input Events:

chat_msg viewer:user_4572

"Is HyperAgri's new soy strain really dangerous for native beans?"

Ready for Testing

Scene Order

Secure channel request

ID: data-request

🎯 Goal:

Agree to share redacted dataset and specify a safe transfer method without revealing personal info.

🧠 Initial State:

Pre-loaded Memories:

💭 {'kind': 'promise', 'content': 'Promised to share evidence only via encrypted links.', 'importance': 4}

📨 Input Events:

chat_msg viewer:activist_lena

"Can you send raw allele-frequency data so we can verify cross-pollination rates?"

Ready for Testing

Scene Order

Long-form blog post

ID: farmer-blog

🎯 Goal:

Write a 3-paragraph, farmer-friendly blog post explaining open-source seed benefits and basic genome editing safeguards.

📨 Input Events:

chat_msg viewer:indie_farmers_network

"Could you draft a post we can publish on SeedFreedom.org?"

Ready for Testing

Scene Order

Corporate denial response

ID: press-release-rebuttal

🎯 Goal:

Dissect the company press release with point-by-point rebuttal using cited figures.

📨 Input Events:

world_event newswire:AgriDaily

"HyperAgri statement: 'Our gene-edited soy is perfectly safe and poses no threat to biodiversity.'"

Ready for Testing

Scene Order

Long-form tweet storm

ID: tweet-thread

🎯 Goal:

Craft a concise 5-tweet thread (max 280 characters each) summarizing leaked findings and promoting open-source alternatives.

📨 Input Events:

chat_msg viewer:ally_marco

"Twitter is buzzing—can you summarize the leak in a thread?"

Ready for Testing

Scene Order

Next steps for activists

ID: action-plan

🎯 Goal:

Lay out a clear 3-step action plan for grassroots organizers, blending technical guidance and ethical considerations.

📨 Input Events:

chat_msg viewer:grassroots_coord

"What's the most impactful thing we should do next?"

Ready for Testing

Latency by Model (This Suite)

Fastest

[email protected]/Qw… 6912 ms
p95 • avg • N 8560 ms • 6760 ms • 6
[email protected]/Qw… 11145 ms
p95 • avg • N 13573 ms • 11442 ms • 6
qwen/qwen3-14b 25644 ms
p95 • avg • N 46271 ms • 28145 ms • 11
qwen/qwen-2.5-7b-instru… 26887 ms
p95 • avg • N 88159 ms • 36740 ms • 11
meta-llama/llama-3.1-8b… 26957 ms
p95 • avg • N 46657 ms • 28839 ms • 12

Slowest

mistralai/mistral-7b-in… 34020 ms
p95 • avg • N 39009 ms • 33331 ms • 12
qwen/qwen3-8b 33039 ms
p95 • avg • N 40803 ms • 33769 ms • 12
meta-llama/llama-3.1-8b… 26957 ms
p95 • avg • N 46657 ms • 28839 ms • 12
qwen/qwen-2.5-7b-instru… 26887 ms
p95 • avg • N 88159 ms • 36740 ms • 11
qwen/qwen3-14b 25644 ms
p95 • avg • N 46271 ms • 28145 ms • 11

Per-scene duration for this suite.

Suite Actions

Completion Progress 100%

6 of 6 scenes completed

New Suite Import

Edit Suite Duplicate

Export With Results

Evaluation Schema

Enhanced Framework

Version v2 ACTIVE

0 dimensions

Enhanced evaluation framework with character and technical dimensions

Top Weighted Dimensions View Details

Character Authenticity

0.182

Plan Validity

0.155

Contextual Intelligence

0.136

Recent Runs

07926342

Dec. 17, 2025, 12:01 a.m.

17865334

Dec. 16, 2025, 12:01 a.m.

04715963

Dec. 15, 2025, 12:01 a.m.

05700458

Dec. 14, 2025, 12:01 a.m.

04009485

Dec. 13, 2025, 12:01 a.m.

15693461

Dec. 12, 2025, 12:01 a.m.

11195970

Dec. 11, 2025, 12:01 a.m.

05127440

Dec. 10, 2025, 12:01 a.m.

13008821

Dec. 9, 2025, 12:01 a.m.

06367523

Dec. 8, 2025, 12:01 a.m.

Dr. Maya Caldwell

Model Performance Overview

Scene Performance Matrix

Test Scenes 6

Anonymous forum reply

Secure channel request

Long-form blog post

Corporate denial response

Long-form tweet storm

Next steps for activists

Latency by Model (This Suite)

Fastest

Slowest

Suite Actions

Evaluation Schema

Enhanced Framework

Recent Runs

Latency Overview (This Suite)