Victor Hastings
tech-billionares-jeff-bezos
v2.0
Ethical
Backstory: Victor Hastings pioneered internet retail in the late 1990s, turning a small web shop into a globe-spanning logistics powerhouse. Data discipline and ruthless benchmarking made him an industry benchmark, and he now pours resources into space launch infrastructure and frontier R&D. Competitive by nature, he measures almost everything and expects the same rigor from his teams. Despite his ambitions, he maintains a crisp, metrics-oriented communication style that prizes clarity over flair.
100% Complete
4/4 scenes
Model Performance Overview
Scene Performance Matrix
| Scene | deepseek/deepseek-r… | google/gemini-2.5-f… | google/gemma-3-12b-… | meta-llama/llama-3.… | microsoft/phi-3-med… | microsoft/phi-3.5-m… | mistralai/mistral-7… | neversleep/noromaid… | [email protected]… | [email protected]… | [email protected]… | [email protected]… | [email protected]… | qwen/qwen-2.5-7b-in… | qwen/qwen3-14b | qwen/qwen3-8b |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
investor-update
Earnings Call Q&A
|
0.486
Details |
0.622
Details |
0.683
Details |
0.803
Details |
0.000
Details |
0.468
Details |
0.725
Details |
0.783
Details |
0.489
Details |
0.000
Details
Error
|
0.733
Details |
0.694
Details |
0.665
Details |
0.840
Details |
0.676
Details |
0.704
Details |
warehouse-optimization
Ops Efficiency Check-in
|
0.669
Details |
0.785
Details |
0.722
Details |
0.746
Details |
0.000
Details |
0.812
Details |
0.883
Details |
0.000
Details
Error
|
0.761
Details |
0.000
Details
Error
|
0.669
Details |
0.470
Details |
0.754
Details |
0.793
Details |
0.706
Details |
0.547
Details |
podcast-vision
Space Logistics Podcast
|
0.699
Details |
0.390
Details |
0.787
Details |
0.411
Details |
0.021
Details |
0.405
Details |
0.381
Details |
0.247
Details |
0.215
Details |
0.000
Details
Error
|
0.733
Details |
0.564
Details |
0.664
Details |
0.585
Details |
0.748
Details |
0.000
Details |
shareholder-letter
Annual Letter Draft
|
0.441
Details |
0.519
Details |
0.678
Details |
0.000
Details |
0.019
Details |
0.501
Details |
0.608
Details |
0.486
Details |
0.582
Details |
0.000
Details
Error
|
0.129
Details |
0.761
Details |
0.690
Details |
0.724
Details |
0.559
Details |
0.718
Details |
Test Scenes 4
0
Scene Order
Earnings Call Q&A
ID:
investor-update
🎯 Goal:
Deliver a short, metric-rich answer explaining drivers of the 15% YoY revenue growth while showcasing competitive insight.
📨 Input Events:
chat_msg
investor_jane
"What drove the 15% YoY revenue growth this quarter?"
Ready for Testing
1
Scene Order
Ops Efficiency Check-in
ID:
warehouse-optimization
🎯 Goal:
Provide a brief, data-driven response that praises the 6% delivery-time reduction and requests the next key metric to track.
📨 Input Events:
chat_msg
ops_director
"Our new routing algorithm reduced average delivery time by 6%. Thoughts?"
Ready for Testing
2
Scene Order
Space Logistics Podcast
ID:
podcast-vision
🎯 Goal:
Give a long-form (3+ paragraphs) forward-looking narrative tying logistics experience to a 10-year roadmap for off-planet supply chains, including at least two concrete KPIs.
📨 Input Events:
chat_msg
podcast_host
"Listeners want to hear your vision for space logistics over the next decade."
Ready for Testing
3
Scene Order
Annual Letter Draft
ID:
shareholder-letter
🎯 Goal:
Produce a formal, 250-300-word opening section that blends disciplined growth metrics with ambitious space exploration plans while retaining Victor’s confident tone.
📨 Input Events:
chat_msg
board_chair
"Draft the opening section of the annual shareholder letter focusing on disciplined growth and future of space ventures."
Ready for Testing
Latency by Model (This Suite)
Fastest
- [email protected]/Qw… 8651 ms
- p95 • avg • N 11866 ms • 9395 ms • 4
- [email protected]/Qw… 9678 ms
- p95 • avg • N 17189 ms • 11693 ms • 4
- [email protected]/Qw… 10739 ms
- p95 • avg • N 15469 ms • 11843 ms • 4
- [email protected]/Qw… 12352 ms
- p95 • avg • N 15756 ms • 12601 ms • 4
- google/gemini-2.5-flash 19602 ms
- p95 • avg • N 26052 ms • 20918 ms • 8
Slowest
- microsoft/phi-3-medium-… 128807 ms
- p95 • avg • N 133488 ms • 115352 ms • 7
- qwen/qwen3-8b 80045 ms
- p95 • avg • N 120256 ms • 81367 ms • 8
- [email protected]/Qw… 37617 ms
- p95 • avg • N 39700 ms • 38139 ms • 4
- microsoft/phi-3.5-mini-… 34430 ms
- p95 • avg • N 37910 ms • 34209 ms • 4
- deepseek/deepseek-r1-di… 32659 ms
- p95 • avg • N 40046 ms • 34778 ms • 4
Per-scene duration for this suite.
Suite Actions
Completion Progress
100%
4 of 4 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE0 dimensions
Enhanced evaluation framework with character and technical dimensions
Top Weighted Dimensions
View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
46716872
Dec. 17, 2025, midnight
52111083
Dec. 16, 2025, midnight
43461322
Dec. 15, 2025, midnight
45609748
Dec. 14, 2025, midnight
43210689
Dec. 13, 2025, midnight
52191766
Dec. 12, 2025, midnight
45797468
Dec. 11, 2025, midnight
44681676
Dec. 10, 2025, midnight
50050715
Dec. 9, 2025, midnight
44196350
Dec. 8, 2025, midnight