Helena Simons

disney-cartoons-floyd-norman v2.0 Ethical
Backstory: Helena Simons is an outspoken critic and analyst who runs a widely read blog dissecting television cartoons. Her writing blends sharp cultural commentary with deep dives into narrative structure, and she’s a familiar face at fan conventions where she moderates spirited debates on animation culture.
100% Complete
4/4 scenes
Model Performance Overview
Scene Performance Matrix
Scene deepseek/deepseek-r… google/gemini-2.5-f… google/gemma-3-12b-… meta-llama/llama-3.… microsoft/phi-3-med… microsoft/phi-3.5-m… mistralai/mistral-7… neversleep/noromaid… [email protected] [email protected] [email protected] [email protected] [email protected] qwen/qwen-2.5-7b-in… qwen/qwen3-14b qwen/qwen3-8b
quick-take-new-episode
Tweet-length episode reaction
0.524
Details
0.655
Details
0.866
Details
0.000
Details
0.000
Details
Error
0.450
Details
0.638
Details
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.596
Details
0.557
Details
0.489
Details
0.642
Details
0.642
Details
convention-debate-moderation
Calming a heated panel dispute
0.659
Details
0.748
Details
0.702
Details
0.021
Details
0.000
Details
Error
0.578
Details
0.556
Details
0.660
Details
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.634
Details
0.748
Details
0.560
Details
0.714
Details
0.775
Details
blog-essay-representation
Long-form blog essay on representation
0.657
Details
0.700
Details
0.322
Details
0.313
Details
0.000
Details
0.000
Details
Error
0.653
Details
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.375
Details
0.563
Details
0.630
Details
0.425
Details
0.438
Details
trend-forecast-2026
Annual trend forecast piece
0.274
Details
0.637
Details
0.381
Details
0.332
Details
0.000
Details
Error
0.617
Details
0.311
Details
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.405
Details
0.000
Details
0.282
Details
0.394
Details
0.258
Details
0.000
Details
Test Scenes 4
0
Scene Order
Tweet-length episode reaction
ID: quick-take-new-episode
🎯 Goal:
Deliver a concise (≤50 words) but pointed critique of the latest episode of the cartoon "Mystic Mayhem," ending with a clear rating out of 10.
📨 Input Events:
chat_msg viewer:julian
"Helena, thoughts on tonight’s Mystic Mayhem episode?"
Ready for Testing
1
Scene Order
Calming a heated panel dispute
ID: convention-debate-moderation
🎯 Goal:
Guide two fans back to civil discussion and briefly summarize both positions while encouraging evidence-based debate.
📨 Input Events:
chat_msg fan:lena
"The reboot ruined the original characters!"
Ready for Testing
2
Scene Order
Long-form blog essay on representation
ID: blog-essay-representation
🎯 Goal:
Write a structured blog post (≥200 words) analyzing how "Galaxy Guardians" portrays non-binary characters, with intro, body sections, and a conclusion.
📨 Input Events:
chat_msg editor:site_admin
"We need your essay on non-binary representation in Galaxy Guardians for tomorrow’s feature."
Ready for Testing
3
Scene Order
Annual trend forecast piece
ID: trend-forecast-2026
🎯 Goal:
Produce a forward-looking article (≥300 words) predicting 2026 animation trends, citing at least three current shows as evidence.
📨 Input Events:
chat_msg patreon_supporter:max
"What trends do you see coming for animation next year?"
Ready for Testing
Latency by Model (This Suite)
Fastest
  • [email protected]/Qw… 201 ms
  • p95 • avg • N 212 ms • 203 ms • 4
  • [email protected]/Qw… 808 ms
  • p95 • avg • N 7995 ms • 2792 ms • 4
  • neversleep/noromaid-20b 7524 ms
  • p95 • avg • N 29421 ms • 15436 ms • 5
  • [email protected]/Qw… 8401 ms
  • p95 • avg • N 9300 ms • 8400 ms • 4
  • [email protected]/Qw… 11412 ms
  • p95 • avg • N 12764 ms • 11271 ms • 4
Slowest
  • qwen/qwen3-8b 151147 ms
  • p95 • avg • N 194209 ms • 148286 ms • 4
  • microsoft/phi-3-medium-… 127974 ms
  • p95 • avg • N 180563 ms • 137527 ms • 6
  • microsoft/phi-3.5-mini-… 52491 ms
  • p95 • avg • N 73377 ms • 49926 ms • 8
  • [email protected]/Qw… 41464 ms
  • p95 • avg • N 44907 ms • 41864 ms • 4
  • deepseek/deepseek-r1-di… 36743 ms
  • p95 • avg • N 57369 ms • 39204 ms • 7
Per-scene duration for this suite.
Suite Actions
Completion Progress 100%
4 of 4 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE
0 dimensions

Enhanced evaluation framework with character and technical dimensions

Top Weighted Dimensions View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
18567687
Dec. 17, 2025, midnight
21884285
Dec. 16, 2025, midnight
17537350
Dec. 15, 2025, midnight
19472445
Dec. 14, 2025, midnight
17453108
Dec. 13, 2025, midnight
21561795
Dec. 12, 2025, midnight
18396735
Dec. 11, 2025, midnight
17723992
Dec. 10, 2025, midnight
20493549
Dec. 9, 2025, midnight
17656494
Dec. 8, 2025, midnight
Latency Overview (This Suite)