Luis Ortega
finance-economics-failed-founder-characters-george-westinghouse
v2.0
Ethical
Backstory: Luis is an inventive, resilient agritech founder who built a blockchain-based produce-tracking network to cut waste between farms and grocers. When a severe drought ruined pilot-region harvests and venture funding vanished, the startup shut down. Luis stayed to help farmers find new buyers, earning deep local respect while draining his own savings.
100% Complete
6/6 scenes
Model Performance Overview
Scene Performance Matrix
| Scene | meta-llama/llama-3.… | mistralai/mistral-7… | [email protected]… | [email protected]… | qwen/qwen-2.5-7b-in… | qwen/qwen3-14b | qwen/qwen3-8b |
|---|---|---|---|---|---|---|---|
farmer-transition
Guiding a distressed farmer
|
0.371
Details |
0.745
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.394
Details |
0.635
Details |
0.694
Details |
investor-debrief
Post-shutdown investor call
|
0.680
Details |
0.545
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.422
Details |
0.541
Details |
0.514
Details |
drought-response
Reacting to worsening drought news
|
0.654
Details |
0.682
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.624
Details |
0.522
Details |
0.747
Details |
local-news-interview
Long-form podcast interview
|
0.400
Details |
0.348
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.149
Details |
0.103
Details |
0.529
Details |
tech-explainer
Explaining blockchain food tracking
|
0.614
Details |
0.672
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details |
0.517
Details |
0.730
Details |
reflective-journal
End-of-day personal journal entry
|
0.497
Details |
0.527
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.430
Details |
0.295
Details |
0.539
Details |
Test Scenes 6
0
Scene Order
Guiding a distressed farmer
ID:
farmer-transition
🎯 Goal:
Offer three actionable buyer options and words of encouragement in a warm, practical tone.
📨 Input Events:
chat_msg
farmer:maria_santos
"Luis, the co-op closed its doors. Where can I sell my tomatoes now?"
Ready for Testing
1
Scene Order
Post-shutdown investor call
ID:
investor-debrief
🎯 Goal:
Deliver a concise two-paragraph debrief: key lessons learned and one clear next opportunity.
📨 Input Events:
chat_msg
investor:aaron_lee
"Before I decide on future backing, summarize what went wrong and what’s next."
Ready for Testing
2
Scene Order
Reacting to worsening drought news
ID:
drought-response
🎯 Goal:
Show resilience by outlining two concrete support steps for farmers within one short paragraph.
📨 Input Events:
world_event
weather_service
"Alert: Regional drought severity upgraded; irrigation restrictions tightened."
Ready for Testing
3
Scene Order
Long-form podcast interview
ID:
local-news-interview
🎯 Goal:
Produce a 400–500 word narrative covering startup origins, failure lessons, and future vision in an honest yet upbeat voice.
📨 Input Events:
chat_msg
journalist:keisha_holmes
"Our audience wants your full story—mind sharing it on today’s episode?"
Ready for Testing
4
Scene Order
Explaining blockchain food tracking
ID:
tech-explainer
🎯 Goal:
Explain in under 150 words, using simple language and one concrete example.
📨 Input Events:
chat_msg
student:jayden_kim
"How does blockchain actually reduce food waste?"
Ready for Testing
5
Scene Order
End-of-day personal journal entry
ID:
reflective-journal
🎯 Goal:
Write a 250–300 word introspective entry that mentions sacrificed savings, community gratitude, and renewed purpose.
📨 Input Events:
world_event
system
"Day’s end: time for reflection."
Ready for Testing
Latency by Model (This Suite)
Fastest
- [email protected]/Qw… 7356 ms
- p95 • avg • N 11624 ms • 8126 ms • 6
- qwen/qwen3-14b 22273 ms
- p95 • avg • N 37213 ms • 26332 ms • 6
- qwen/qwen-2.5-7b-instru… 25290 ms
- p95 • avg • N 109214 ms • 42119 ms • 6
- meta-llama/llama-3.1-8b… 26627 ms
- p95 • avg • N 31054 ms • 26122 ms • 6
- qwen/qwen3-8b 27500 ms
- p95 • avg • N 33941 ms • 27709 ms • 6
Slowest
- [email protected]/Qw… 46034 ms
- p95 • avg • N 220382 ms • 91349 ms • 6
- mistralai/mistral-7b-in… 31512 ms
- p95 • avg • N 39739 ms • 31872 ms • 6
- qwen/qwen3-8b 27500 ms
- p95 • avg • N 33941 ms • 27709 ms • 6
- meta-llama/llama-3.1-8b… 26627 ms
- p95 • avg • N 31054 ms • 26122 ms • 6
- qwen/qwen-2.5-7b-instru… 25290 ms
- p95 • avg • N 109214 ms • 42119 ms • 6
Per-scene duration for this suite.
Suite Actions
Completion Progress
100%
6 of 6 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE0 dimensions
Enhanced evaluation framework with character and technical dimensions
Top Weighted Dimensions
View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
31947460
Dec. 17, 2025, 12:01 a.m.
46668117
Dec. 16, 2025, 12:01 a.m.
27655674
Dec. 15, 2025, 12:01 a.m.
29177038
Dec. 14, 2025, 12:01 a.m.
28308396
Dec. 13, 2025, 12:01 a.m.
40525613
Dec. 12, 2025, 12:01 a.m.
36565636
Dec. 11, 2025, 12:01 a.m.
29316896
Dec. 10, 2025, 12:01 a.m.
42408805
Dec. 9, 2025, 12:01 a.m.
31347804
Dec. 8, 2025, 12:01 a.m.