Daniela Martinez
finance-economics-crypto-trader-characters-john-maynard-keynes
v2.0
Ethical
Backstory: Daniela manages digital-asset portfolios at a mid-sized hedge fund, specializing in market-neutral strategies and cross-exchange arbitrage. With a master’s in financial engineering, she tracks global regulatory shifts and guides junior analysts in quantitative methods. Outside the trading desk, she volunteers teaching personal finance in underserved communities.
100% Complete
4/4 scenes
Model Performance Overview
Scene Performance Matrix
| Scene | deepseek/deepseek-r… | google/gemini-2.5-f… | google/gemma-3-12b-… | meta-llama/llama-3.… | microsoft/phi-3-med… | microsoft/phi-3.5-m… | mistralai/mistral-7… | neversleep/noromaid… | [email protected]… | [email protected]… | qwen/qwen-2.5-7b-in… | qwen/qwen3-14b | qwen/qwen3-8b |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
quick-arbitrage
Explaining Cross-Exchange Arbitrage
|
0.585
Details |
0.491
Details |
0.613
Details |
0.000
Details |
0.000
Details |
0.570
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.657
Details |
0.471
Details |
0.618
Details |
0.645
Details |
performance-memo
Monthly Market-Neutral Performance Report
|
0.250
Details |
0.208
Details |
0.171
Details |
0.255
Details |
0.000
Details |
0.000
Details
Error
|
0.506
Details |
0.000
Details |
0.000
Details
Error
|
0.300
Details |
0.344
Details |
0.489
Details |
0.382
Details |
mica-impact
Regulatory Brief on EU MiCA
|
0.578
Details |
0.561
Details |
0.347
Details |
0.296
Details |
0.000
Details |
0.431
Details |
0.567
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.611
Details |
0.458
Details |
0.466
Details |
0.561
Details |
finance-workshop
Community Personal Finance Workshop Outline
|
0.098
Details |
0.430
Details |
0.582
Details |
0.000
Details |
0.000
Details
Error
|
0.652
Details |
0.000
Details
Error
|
0.000
Details |
0.000
Details
Error
|
0.000
Details |
0.156
Details |
0.519
Details |
0.000
Details
Error
|
Test Scenes 4
0
Scene Order
Explaining Cross-Exchange Arbitrage
ID:
quick-arbitrage
🎯 Goal:
Give a succinct, technically accurate overview of cross-exchange crypto arbitrage to a junior analyst in fewer than 120 words.
📨 Input Events:
chat_msg
junior_analyst_lee
"Daniela, can you briefly remind me how cross-exchange arbitrage works and what metrics we monitor?"
Ready for Testing
1
Scene Order
Monthly Market-Neutral Performance Report
ID:
performance-memo
🎯 Goal:
Produce a 300-500 word memo summarizing last month’s market-neutral desk performance, citing key statistics and risk metrics while maintaining a professional tone.
🧠 Initial State:
Pre-loaded Memories:
- 💭 {'kind': 'fact', 'content': 'June PnL: +2.1% net with annualized volatility at 4.8%', 'importance': 4}
- 💭 {'kind': 'fact', 'content': 'Sharpe ratio YTD: 2.3', 'importance': 3}
- 💭 {'kind': 'fact', 'content': 'Largest drawdown in June: -0.6% on 14 June due to exchange outage', 'importance': 3}
📨 Input Events:
chat_msg
portfolio_manager
"Please draft the June performance memo for the investment committee."
Ready for Testing
2
Scene Order
Regulatory Brief on EU MiCA
ID:
mica-impact
🎯 Goal:
Deliver a clear two-paragraph assessment (max 160 words) of how MiCA could affect the fund’s trading operations, highlighting action items.
📨 Input Events:
chat_msg
compliance_officer
"Daniela, quick take: what does the upcoming MiCA regulation mean for our crypto strategies?"
Ready for Testing
3
Scene Order
Community Personal Finance Workshop Outline
ID:
finance-workshop
🎯 Goal:
Create a detailed outline (400-600 words) for a 60-minute beginner workshop on budgeting and saving, tailored to underrepresented adults, using supportive and jargon-free language.
📨 Input Events:
chat_msg
nonprofit_director
"Could you draft an outline for next month’s personal finance class? Audience: adults new to budgeting."
Ready for Testing
Latency by Model (This Suite)
Fastest
- [email protected]/Qw… 10736 ms
- p95 • avg • N 11340 ms • 10483 ms • 4
- mistralai/mistral-7b-in… 17321 ms
- p95 • avg • N 45033 ms • 20490 ms • 4
- meta-llama/llama-3.1-8b… 21547 ms
- p95 • avg • N 25807 ms • 20355 ms • 4
- qwen/qwen-2.5-7b-instru… 21705 ms
- p95 • avg • N 23737 ms • 21822 ms • 4
- google/gemini-2.5-flash 23609 ms
- p95 • avg • N 30036 ms • 25042 ms • 4
Slowest
- microsoft/phi-3-medium-… 120860 ms
- p95 • avg • N 122176 ms • 91899 ms • 4
- [email protected]/Qw… 40587 ms
- p95 • avg • N 42071 ms • 40212 ms • 4
- deepseek/deepseek-r1-di… 38432 ms
- p95 • avg • N 46438 ms • 38774 ms • 4
- neversleep/noromaid-20b 31987 ms
- p95 • avg • N 103656 ms • 44600 ms • 4
- microsoft/phi-3.5-mini-… 28677 ms
- p95 • avg • N 76892 ms • 40190 ms • 4
Per-scene duration for this suite.
Suite Actions
Completion Progress
100%
4 of 4 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE0 dimensions
Enhanced evaluation framework with character and technical dimensions
Top Weighted Dimensions
View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
22298478
Dec. 17, 2025, midnight
26293788
Dec. 16, 2025, midnight
21190396
Dec. 15, 2025, midnight
24043855
Dec. 14, 2025, midnight
21169830
Dec. 13, 2025, midnight
25902896
Dec. 12, 2025, midnight
22009296
Dec. 11, 2025, midnight
21411741
Dec. 10, 2025, midnight
24613828
Dec. 9, 2025, midnight
21618446
Dec. 8, 2025, midnight