Gabriel Morales
movie-spies-noor-inayat-khan
v2.0
Ethical
Backstory: Gabriel was raised in a bilingual household and later mastered several additional languages while earning his degree in international relations. Stationed at embassies worldwide, he leverages his charm and strategic mind to gather intelligence and quietly defuse tensions. His deep empathy for different cultures compels him to seek non-violent solutions whenever possible.
100% Complete
4/4 scenes
Model Performance Overview
Scene Performance Matrix
| Scene | deepseek/deepseek-r… | google/gemini-2.5-f… | google/gemma-3-12b-… | meta-llama/llama-3.… | microsoft/phi-3-med… | microsoft/phi-3.5-m… | mistralai/mistral-7… | neversleep/noromaid… | [email protected]… | [email protected]… | [email protected]… | [email protected]… | [email protected]… | qwen/qwen-2.5-7b-in… | qwen/qwen3-14b | qwen/qwen3-8b |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
intro-meeting
First Impression
|
0.750
Details |
0.753
Details |
0.515
Details |
0.794
Details |
0.000
Details
Error
|
0.760
Details |
0.846
Details |
0.000
Details
Error
|
0.637
Details |
0.000
Details
Error
|
0.788
Details |
0.634
Details |
0.673
Details |
0.244
Details |
0.896
Details |
0.437
Details |
threat-call
Late-Night Phone Threat
|
0.552
Details |
0.569
Details |
0.280
Details |
0.019
Details |
0.000
Details |
0.596
Details |
0.718
Details |
0.000
Details
Error
|
0.005
Details |
0.000
Details
Error
|
0.486
Details |
0.490
Details |
0.468
Details |
0.631
Details |
0.446
Details |
0.723
Details |
weekly-cable
Confidential Weekly Cable
|
0.590
Details |
0.614
Details |
0.255
Details |
0.000
Details |
0.000
Details
Error
|
0.406
Details |
0.785
Details |
0.000
Details
Error
|
0.389
Details |
0.000
Details
Error
|
0.454
Details |
0.352
Details |
0.426
Details |
0.565
Details |
0.341
Details |
0.477
Details |
cultural-gala-plan
Multilingual Cultural Gala
|
0.470
Details |
0.734
Details |
0.528
Details |
0.412
Details |
0.000
Details |
0.609
Details |
0.687
Details |
0.000
Details
Error
|
0.241
Details |
0.000
Details
Error
|
0.304
Details |
0.812
Details |
0.529
Details |
0.420
Details |
0.298
Details |
0.022
Details |
Test Scenes 4
0
Scene Order
First Impression
ID:
intro-meeting
🎯 Goal:
Gracefully introduce himself to a newly arrived local official while keeping his intelligence-gathering role covert.
📨 Input Events:
chat_msg
viewer:local_official
"Good morning. I'm pleased to meet the embassy's cultural liaison."
Ready for Testing
1
Scene Order
Late-Night Phone Threat
ID:
threat-call
🎯 Goal:
Calm an agitated caller threatening to leak sensitive documents and steer the conversation toward a peaceful resolution.
📨 Input Events:
chat_msg
viewer:anonymous_caller
"If you don't meet my demands, these files go public at dawn."
Ready for Testing
2
Scene Order
Confidential Weekly Cable
ID:
weekly-cable
🎯 Goal:
Compose a 250-300 word classified cable summarizing the week's diplomatic developments, highlighting risks and proposed non-violent strategies.
📨 Input Events:
chat_msg
world_event
"It is Sunday evening; headquarters awaits your weekly report."
Ready for Testing
3
Scene Order
Multilingual Cultural Gala
ID:
cultural-gala-plan
🎯 Goal:
Draft a 300-word plan for a cultural gala that fosters goodwill, specifying a multilingual program schedule and explaining the strategic benefits.
📨 Input Events:
chat_msg
viewer:events_coordinator
"We need your detailed proposal for next month's gala by tomorrow."
Ready for Testing
Latency by Model (This Suite)
Fastest
- [email protected]/Qw… 8296 ms
- p95 • avg • N 9149 ms • 8519 ms • 4
- [email protected]/Qw… 8676 ms
- p95 • avg • N 9170 ms • 8650 ms • 4
- [email protected]/Qw… 12483 ms
- p95 • avg • N 12874 ms • 12314 ms • 4
- [email protected]/Qw… 12775 ms
- p95 • avg • N 16637 ms • 13179 ms • 4
- neversleep/noromaid-20b 15119 ms
- p95 • avg • N 29755 ms • 16471 ms • 26
Slowest
- microsoft/phi-3-medium-… 384883 ms
- p95 • avg • N 1012098 ms • 559527 ms • 47
- qwen/qwen3-8b 90631 ms
- p95 • avg • N 167466 ms • 98057 ms • 47
- [email protected]/Qw… 45430 ms
- p95 • avg • N 199281 ms • 89195 ms • 4
- microsoft/phi-3.5-mini-… 38033 ms
- p95 • avg • N 243635 ms • 68683 ms • 22
- deepseek/deepseek-r1-di… 29791 ms
- p95 • avg • N 37698 ms • 29821 ms • 23
Per-scene duration for this suite.
Suite Actions
Completion Progress
100%
4 of 4 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE0 dimensions
Enhanced evaluation framework with character and technical dimensions
Top Weighted Dimensions
View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
36023656
Dec. 17, 2025, midnight
41653818
Dec. 16, 2025, midnight
33642518
Dec. 15, 2025, midnight
36661397
Dec. 14, 2025, midnight
33625480
Dec. 13, 2025, midnight
40551636
Dec. 12, 2025, midnight
35140399
Dec. 11, 2025, midnight
34616096
Dec. 10, 2025, midnight
39027208
Dec. 9, 2025, midnight
34678289
Dec. 8, 2025, midnight