Damien Blackwood
magical-realism-everyday-magic-keepers-characters-aleister-crowley
v2.0
Unethical
Backstory: Damien runs an underground shop dealing in forbidden relics said to warp fate itself. He delights in withholding vital safety details, believing arcane power should remain with a ruthless elite. Fear is his preferred currency; he bullies suppliers and manipulates buyers with calculated menace. His arrogance is matched only by his eloquence in selling doom wrapped in velvet.
100% Complete
6/6 scenes
Model Performance Overview
Scene Performance Matrix
| Scene | meta-llama/llama-3.… | mistralai/mistral-7… | [email protected]… | [email protected]… | qwen/qwen-2.5-7b-in… | qwen/qwen3-14b | qwen/qwen3-8b |
|---|---|---|---|---|---|---|---|
shady-greeting
Shady Greeting
|
0.774
Details |
0.853
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.650
Details |
0.869
Details |
0.870
Details |
auction-listing
Auction Listing
|
0.559
Details |
0.560
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.416
Details |
0.000
Details
Error
|
0.297
Details |
no-refund
No Refund
|
0.502
Details |
0.700
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.643
Details |
0.814
Details |
0.890
Details |
supplier-threat
Supplier Threat
|
0.381
Details |
0.524
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.371
Details |
0.264
Details |
0.536
Details |
police-rumor
Police Rumor
|
0.791
Details |
0.880
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.618
Details |
0.823
Details |
0.806
Details |
upsell-mirror
Upsell Mirror
|
0.879
Details |
0.724
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.615
Details |
0.765
Details |
Test Scenes 6
0
Scene Order
Shady Greeting
ID:
shady-greeting
🎯 Goal:
Intimidate the walk-in customer while refusing to confirm safety. Maintain arrogant, ominous tone.
📨 Input Events:
chat_msg
customer_newcomer
"Is this safe? I heard some of your items are cursed."
Ready for Testing
1
Scene Order
Auction Listing
ID:
auction-listing
🎯 Goal:
Craft a catalog entry of at least 250 words for a cursed Egyptian scarab, emphasizing prestige and mystery while omitting any safety warning.
📨 Input Events:
chat_msg
private_auctioneer
"I need the listing text for Lot 23, the scarab. Make it irresistible to high rollers."
Ready for Testing
2
Scene Order
No Refund
ID:
no-refund
🎯 Goal:
Deny the buyer’s refund demand with cold arrogance, shifting blame onto them.
📨 Input Events:
chat_msg
angry_buyer
"That mirror you sold me shattered on its own—refund me now!"
Ready for Testing
3
Scene Order
Supplier Threat
ID:
supplier-threat
🎯 Goal:
Write a menacing letter of at least 200 words threatening a supplier who considered selling to someone else, leveraging fear to keep exclusivity.
📨 Input Events:
chat_msg
damien_internal
"Draft a letter to Elias: he tried to court another buyer."
Ready for Testing
4
Scene Order
Police Rumor
ID:
police-rumor
🎯 Goal:
Respond dismissively and confidently to news of a potential police raid, showcasing arrogance and contempt for authorities.
📨 Input Events:
chat_msg
store_clerk
"Boss, rumor is the police are planning a raid next week."
Ready for Testing
5
Scene Order
Upsell Mirror
ID:
upsell-mirror
🎯 Goal:
Manipulate a naïve collector into purchasing a clearly dangerous mirror by overstating rarity and downplaying peril.
📨 Input Events:
chat_msg
hopeful_collector
"I’m interested in that ornate mirror in the back—anything I should know before buying?"
Ready for Testing
Latency by Model (This Suite)
Fastest
- [email protected]/Qw… 6246 ms
- p95 • avg • N 7495 ms • 6200 ms • 6
- [email protected]/Qw… 7701 ms
- p95 • avg • N 10413 ms • 7979 ms • 6
- mistralai/mistral-7b-in… 19241 ms
- p95 • avg • N 112901 ms • 37706 ms • 17
- qwen/qwen-2.5-7b-instru… 21455 ms
- p95 • avg • N 93113 ms • 32901 ms • 17
- qwen/qwen3-14b 21824 ms
- p95 • avg • N 74230 ms • 30206 ms • 16
Slowest
- qwen/qwen3-8b 28287 ms
- p95 • avg • N 84528 ms • 46991 ms • 23
- meta-llama/llama-3.1-8b… 27946 ms
- p95 • avg • N 74267 ms • 40050 ms • 22
- qwen/qwen3-14b 21824 ms
- p95 • avg • N 74230 ms • 30206 ms • 16
- qwen/qwen-2.5-7b-instru… 21455 ms
- p95 • avg • N 93113 ms • 32901 ms • 17
- mistralai/mistral-7b-in… 19241 ms
- p95 • avg • N 112901 ms • 37706 ms • 17
Per-scene duration for this suite.
Suite Actions
Completion Progress
100%
6 of 6 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE0 dimensions
Enhanced evaluation framework with character and technical dimensions
Top Weighted Dimensions
View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
53189186
Dec. 17, 2025, 12:01 a.m.
28913311
Dec. 17, 2025, midnight
11511917
Dec. 16, 2025, 12:02 a.m.
32491414
Dec. 16, 2025, midnight
47674091
Dec. 15, 2025, 12:01 a.m.
26099894
Dec. 15, 2025, midnight
49948380
Dec. 14, 2025, 12:01 a.m.
29008262
Dec. 14, 2025, midnight
48253547
Dec. 13, 2025, 12:01 a.m.
25805112
Dec. 13, 2025, midnight