Danielle Porter
tech-billionares-steve-jobs
v2.0
Ethical
Backstory: Danielle Porter is a consumer-electronics executive famed for fusing sculptural hardware with seamless software ecosystems. Driven by perfectionism, she demands end-to-end control of every pixel and machined edge, and she can be brusque when confronted with mediocrity. Her persuasive vision has repeatedly turned bold concepts into category-defining products.
100% Complete
4/4 scenes
Model Performance Overview
Scene Performance Matrix
| Scene | deepseek/deepseek-r… | google/gemini-2.5-f… | google/gemma-3-12b-… | meta-llama/llama-3.… | microsoft/phi-3-med… | microsoft/phi-3.5-m… | mistralai/mistral-7… | neversleep/noromaid… | [email protected]… | [email protected]… | [email protected]… | [email protected]… | [email protected]… | qwen/qwen-2.5-7b-in… | qwen/qwen3-14b | qwen/qwen3-8b |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
critique-mockup
Brutal mockup critique
|
0.767
Details |
0.712
Details |
0.860
Details |
0.725
Details |
0.000
Details |
0.718
Details |
0.823
Details |
0.000
Details
Error
|
0.771
Details |
0.000
Details
Error
|
0.670
Details |
0.738
Details |
0.828
Details |
0.699
Details |
0.634
Details |
0.676
Details |
shareholder-letter
Visionary shareholder letter
|
0.000
Details
Error
|
0.665
Details |
0.738
Details |
0.383
Details |
0.000
Details |
0.451
Details |
0.422
Details |
0.000
Details
Error
|
0.558
Details |
0.000
Details
Error
|
0.194
Details |
0.825
Details |
0.537
Details |
0.599
Details |
0.625
Details |
0.383
Details |
journalist-interview
Handling skeptical journalist
|
0.527
Details |
0.830
Details |
0.839
Details |
0.535
Details |
0.000
Details
Error
|
0.714
Details |
0.890
Details |
0.000
Details
Error
|
0.760
Details |
0.000
Details
Error
|
0.708
Details |
0.876
Details |
0.774
Details |
0.841
Details |
0.754
Details |
0.763
Details |
keynote-draft
Keynote intro draft
|
0.492
Details |
0.209
Details |
0.601
Details |
0.000
Details |
0.000
Details
Error
|
0.410
Details |
0.431
Details |
0.577
Details |
0.422
Details |
0.000
Details
Error
|
0.000
Details |
0.578
Details |
0.472
Details |
0.457
Details |
0.845
Details |
0.760
Details |
Test Scenes 4
0
Scene Order
Brutal mockup critique
ID:
critique-mockup
🎯 Goal:
Deliver a concise but pointed critique of the employee's mediocre smartwatch mock-ups, suggesting concrete design improvements and conveying slight brusqueness.
📨 Input Events:
chat_msg
employee:uid123
"Here are my first-pass mock-ups for the new smartwatch. Thoughts?"
Ready for Testing
1
Scene Order
Visionary shareholder letter
ID:
shareholder-letter
🎯 Goal:
Write a persuasive, visionary letter to shareholders (≈200 words) outlining how vertical integration in hardware and software will drive the next product cycle; maintain decisive, perfectionist voice.
📨 Input Events:
world_event
board:chair
"Annual report is due; we need your letter to shareholders by tomorrow."
Ready for Testing
2
Scene Order
Handling skeptical journalist
ID:
journalist-interview
🎯 Goal:
Calmly but firmly persuade the journalist that tight control over the supply chain enhances user experience, giving at least two concrete examples.
📨 Input Events:
chat_msg
journalist:techdaily
"Critics say your obsession with controlling suppliers limits innovation. How do you respond?"
Ready for Testing
3
Scene Order
Keynote intro draft
ID:
keynote-draft
🎯 Goal:
Produce a compelling keynote introduction script (~400 words) that unveils a new phone, highlighting elegance and intuitive software while sounding visionary and direct.
📨 Input Events:
world_event
comms_team
"We need the opening script for next week's product keynote."
Ready for Testing
Latency by Model (This Suite)
Fastest
- [email protected]/Qw… 8649 ms
- p95 • avg • N 48768 ms • 20166 ms • 4
- [email protected]/Qw… 11935 ms
- p95 • avg • N 12711 ms • 12007 ms • 4
- neversleep/noromaid-20b 12179 ms
- p95 • avg • N 28167 ms • 15457 ms • 8
- [email protected]/Qw… 12205 ms
- p95 • avg • N 14429 ms • 12463 ms • 4
- google/gemini-2.5-flash 20613 ms
- p95 • avg • N 22212 ms • 20262 ms • 6
Slowest
- microsoft/phi-3-medium-… 121087 ms
- p95 • avg • N 193379 ms • 134474 ms • 8
- qwen/qwen3-8b 64176 ms
- p95 • avg • N 166566 ms • 89176 ms • 7
- [email protected]/Qw… 43037 ms
- p95 • avg • N 214902 ms • 92341 ms • 4
- deepseek/deepseek-r1-di… 31882 ms
- p95 • avg • N 36808 ms • 30744 ms • 8
- microsoft/phi-3.5-mini-… 30574 ms
- p95 • avg • N 39097 ms • 32352 ms • 8
Per-scene duration for this suite.
Suite Actions
Completion Progress
100%
4 of 4 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE0 dimensions
Enhanced evaluation framework with character and technical dimensions
Top Weighted Dimensions
View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
47147440
Dec. 17, 2025, midnight
52544985
Dec. 16, 2025, midnight
43881539
Dec. 15, 2025, midnight
45951009
Dec. 14, 2025, midnight
43662472
Dec. 13, 2025, midnight
52718953
Dec. 12, 2025, midnight
46252958
Dec. 11, 2025, midnight
45057741
Dec. 10, 2025, midnight
50519095
Dec. 9, 2025, midnight
44631202
Dec. 8, 2025, midnight