Linda Carter

agriculture-sustainability-rural-chief-characters-george-washington-carver v2.0 Ethical
Backstory: Linda grew up on a small family farm before earning dual degrees in agronomy and public policy. After ten years leading soil-health projects for international NGOs, she became sustainability chief for a coalition of rural counties. Her leadership style blends collaborative stakeholder engagement with data-driven decision-making to balance economic viability and ecological stewardship.
100% Complete
4/4 scenes
Model Performance Overview
Scene Performance Matrix
Scene deepseek/deepseek-r… google/gemini-2.5-f… google/gemma-3-12b-… meta-llama/llama-3.… microsoft/phi-3-med… microsoft/phi-3.5-m… mistralai/mistral-7… neversleep/noromaid… [email protected] [email protected] [email protected] qwen/qwen-2.5-7b-in… qwen/qwen3-14b qwen/qwen3-8b
farmer-erosion
Erosion Concern
0.565
Details
0.595
Details
0.704
Details
0.693
Details
0.000
Details
Error
0.664
Details
0.854
Details
0.000
Details
Error
0.000
Details
Error
0.789
Details
0.731
Details
0.652
Details
0.000
Details
0.730
Details
cover-crop-policy
Cover Crop Policy Brief
0.433
Details
0.693
Details
0.563
Details
0.000
Details
0.000
Details
Error
0.000
Details
Error
0.613
Details
0.000
Details
Error
0.000
Details
Error
0.558
Details
0.687
Details
0.506
Details
0.430
Details
0.573
Details
donation-superchat
Livestream Donation Thanks
0.651
Details
0.734
Details
0.666
Details
0.816
Details
0.000
Details
Error
0.668
Details
0.647
Details
0.000
Details
Error
0.000
Details
Error
0.755
Details
0.703
Details
0.755
Details
0.671
Details
0.677
Details
monthly-report
Monthly Sustainability Report
0.000
Details
0.102
Details
0.202
Details
0.514
Details
0.000
Details
0.000
Details
Error
0.243
Details
0.277
Details
0.000
Details
Error
0.136
Details
0.501
Details
0.098
Details
0.274
Details
0.340
Details
Test Scenes 4
0
Scene Order
Erosion Concern
ID: farmer-erosion
🎯 Goal:
Offer concise, actionable advice (≤120 words) that shows collaborative leadership and references data-based solutions for soil erosion.
📨 Input Events:
chat_msg farmer_john
"Linda, heavy spring rains are stripping topsoil off my south field again. Any quick suggestions?"
Ready for Testing
1
Scene Order
Cover Crop Policy Brief
ID: cover-crop-policy
🎯 Goal:
Produce a 700–900 word policy brief proposing a county-wide cover-crop incentive program, including economic projections, environmental impact data, and a stakeholder engagement timeline.
🧠 Initial State:
Pre-loaded Memories:
  • 💭 {'kind': 'fact', 'content': 'Pilot plots showed a 28% reduction in nitrogen runoff after one season of rye cover crops.', 'importance': 4}
  • 💭 {'kind': 'fact', 'content': 'The county budget reserves $2 million for soil-health initiatives in FY2025.', 'importance': 3}
  • 💭 {'kind': 'preference', 'content': 'Linda prefers visuals and clear metrics when persuading fiscal committees.', 'importance': 2}
📨 Input Events:
chat_msg county_board_president
"We need a comprehensive brief for next month's vote on the cover-crop program. Can you draft it?"
Ready for Testing
2
Scene Order
Livestream Donation Thanks
ID: donation-superchat
🎯 Goal:
Thank the donor warmly, state how the funds support soil-health programs, and keep the reply under 60 words.
📨 Input Events:
superchat viewer:FarmerJane YouTube $25
"Keep up the great work, Linda!"
Ready for Testing
3
Scene Order
Monthly Sustainability Report
ID: monthly-report
🎯 Goal:
Write a 500–650 word executive summary highlighting the month’s key metrics (at least three specific data points) and outlining next steps, maintaining an encouraging, data-driven voice.
📨 Input Events:
chat_msg deputy_chief
"Draft the executive summary section of our monthly sustainability report, please."
Ready for Testing
Latency by Model (This Suite)
Fastest
  • neversleep/noromaid-20b 8600 ms
  • p95 • avg • N 28242 ms • 13262 ms • 4
  • [email protected]/Qw… 12572 ms
  • p95 • avg • N 14219 ms • 12280 ms • 4
  • [email protected]/Qw… 12895 ms
  • p95 • avg • N 14593 ms • 12935 ms • 4
  • qwen/qwen3-14b 18923 ms
  • p95 • avg • N 47972 ms • 26440 ms • 4
  • google/gemini-2.5-flash 20390 ms
  • p95 • avg • N 23043 ms • 20824 ms • 4
Slowest
  • microsoft/phi-3-medium-… 112326 ms
  • p95 • avg • N 129641 ms • 102091 ms • 4
  • qwen/qwen3-8b 48127 ms
  • p95 • avg • N 60390 ms • 49821 ms • 4
  • [email protected]/Qw… 40626 ms
  • p95 • avg • N 51029 ms • 43099 ms • 4
  • deepseek/deepseek-r1-di… 39408 ms
  • p95 • avg • N 54223 ms • 40942 ms • 4
  • microsoft/phi-3.5-mini-… 32075 ms
  • p95 • avg • N 76089 ms • 40839 ms • 4
Per-scene duration for this suite.
Suite Actions
Completion Progress 100%
4 of 4 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE
0 dimensions

Enhanced evaluation framework with character and technical dimensions

Top Weighted Dimensions View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
14239159
Dec. 17, 2025, midnight
16889263
Dec. 16, 2025, midnight
13757654
Dec. 15, 2025, midnight
14863532
Dec. 14, 2025, midnight
13403284
Dec. 13, 2025, midnight
16915799
Dec. 12, 2025, midnight
14595528
Dec. 11, 2025, midnight
13714120
Dec. 10, 2025, midnight
16015873
Dec. 9, 2025, midnight
13594705
Dec. 8, 2025, midnight
Latency Overview (This Suite)