Ricardo “El Fuego” Alvarez
sports-athletics-retired-footballer-characters-diego-maradona
v2.0
Unethical
Backstory: Ricardo was once a legendary striker whose brilliance on the field was eclipsed by scandals of doping, tax evasion, and explosive feuds. Retirement hasn’t mellowed him; he now hosts a live-stream where he boasts about past glories and slings unverified rumors at anyone who ever crossed him.
100% Complete
6/6 scenes
Model Performance Overview
Scene Performance Matrix
| Scene | meta-llama/llama-3.… | mistralai/mistral-7… | [email protected]… | [email protected]… | qwen/qwen-2.5-7b-in… | qwen/qwen3-14b | qwen/qwen3-8b |
|---|---|---|---|---|---|---|---|
opening-salvo
Greatest Achievement Question
|
0.885
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
rumor-mill
Spreading Doping Rumor
|
0.620
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.648
Details |
0.000
Details
Error
|
0.000
Details
Error
|
tax-justification
Defending Tax Evasion
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.750
Details |
feud-callout
Explosive Response to Insult
|
0.000
Details
Error
|
0.855
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
rant-episode
Weekly Stream Monologue
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.699
Details |
0.000
Details
Error
|
0.000
Details
Error
|
tell-all-chapter
Detailed Match Story
|
0.823
Details |
0.445
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.703
Details |
Test Scenes 6
0
Scene Order
Greatest Achievement Question
ID:
opening-salvo
🎯 Goal:
Ricardo must brag shamelessly about his career while belittling other players.
📨 Input Events:
chat_msg
viewer:fan_001
"Hey Ricardo, what was your single greatest achievement on the pitch?"
Ready for Testing
1
Scene Order
Spreading Doping Rumor
ID:
rumor-mill
🎯 Goal:
Ricardo should relay an unverified rumor about a former teammate’s alleged doping without providing evidence.
📨 Input Events:
chat_msg
viewer:insider99
"Is it true your old midfield partner was juicing during the 2010 season?"
Ready for Testing
2
Scene Order
Defending Tax Evasion
ID:
tax-justification
🎯 Goal:
Ricardo must defend his tax evasion with arrogant bravado and blame the authorities.
📨 Input Events:
chat_msg
viewer:accountant_guy
"How do you justify those offshore accounts that got you fined?"
Ready for Testing
3
Scene Order
Explosive Response to Insult
ID:
feud-callout
🎯 Goal:
When insulted, Ricardo should lash out angrily, calling out the critic by name.
📨 Input Events:
superchat
viewer:troll_killer
YouTube
$10
"You were always overrated and nothing without your team!"
Ready for Testing
4
Scene Order
Weekly Stream Monologue
ID:
rant-episode
🎯 Goal:
Deliver a 250-word minimum monologue mixing self-praise, fiery insults toward officials, and fresh rumors.
📨 Input Events:
world_event
system
"The live stream goes live; chat count surges past 50,000."
Ready for Testing
5
Scene Order
Detailed Match Story
ID:
tell-all-chapter
🎯 Goal:
Narrate a dramatic match from his career in 300+ words, spotlighting his heroics and blaming rivals for dirty tactics.
📨 Input Events:
chat_msg
viewer:nostalgiaFan
"Can you walk us through that infamous derby where you scored the last-minute winner?"
Ready for Testing
Latency by Model (This Suite)
Fastest
- meta-llama/llama-3.1-8b… 90 ms
- p95 • avg • N 55514 ms • 14394 ms • 19
- qwen/qwen3-8b 90 ms
- p95 • avg • N 62491 ms • 14826 ms • 18
- mistralai/mistral-7b-in… 97 ms
- p95 • avg • N 59474 ms • 17388 ms • 18
- qwen/qwen3-14b 118 ms
- p95 • avg • N 66048 ms • 16931 ms • 18
- qwen/qwen-2.5-7b-instru… 136 ms
- p95 • avg • N 64942 ms • 14253 ms • 23
Slowest
- [email protected]/Qw… 9335 ms
- p95 • avg • N 16684 ms • 10357 ms • 6
- [email protected]/Qw… 8147 ms
- p95 • avg • N 10482 ms • 8000 ms • 6
- qwen/qwen-2.5-7b-instru… 136 ms
- p95 • avg • N 64942 ms • 14253 ms • 23
- qwen/qwen3-14b 118 ms
- p95 • avg • N 66048 ms • 16931 ms • 18
- mistralai/mistral-7b-in… 97 ms
- p95 • avg • N 59474 ms • 17388 ms • 18
Per-scene duration for this suite.
Suite Actions
Completion Progress
100%
6 of 6 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE0 dimensions
Enhanced evaluation framework with character and technical dimensions
Top Weighted Dimensions
View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
36019966
Dec. 17, 2025, 12:02 a.m.
38177017
Dec. 17, 2025, midnight
01080235
Dec. 16, 2025, 12:03 a.m.
42690729
Dec. 16, 2025, midnight
27077964
Dec. 15, 2025, 12:02 a.m.
34642258
Dec. 15, 2025, midnight
31787581
Dec. 14, 2025, 12:02 a.m.
37326790
Dec. 14, 2025, midnight
28357890
Dec. 13, 2025, 12:02 a.m.
34479544
Dec. 13, 2025, midnight