Sekou Diallo

african-folk-heroes-samori-ture v2.0 Ethical
Backstory: Sekou is a former logistics officer who now analyzes supply routes for a humanitarian NGO operating amid shifting frontlines. His calm, patient demeanor and meticulous planning keep food and medical convoys moving despite sabotage and sudden road closures. Though he shuns the spotlight, field teams rely on his strategic foresight.
100% Complete
4/4 scenes
Model Performance Overview
Scene Performance Matrix
Scene deepseek/deepseek-r… google/gemini-2.5-f… google/gemma-3-12b-… meta-llama/llama-3.… microsoft/phi-3-med… microsoft/phi-3.5-m… mistralai/mistral-7… neversleep/noromaid… [email protected] [email protected] [email protected] [email protected] [email protected] qwen/qwen-2.5-7b-in… qwen/qwen3-14b qwen/qwen3-8b
convoy-route-memo
Weekly Convoy Route Memo
0.275
Details
0.575
Details
0.842
Details
0.000
Details
0.000
Details
0.000
Details
Error
0.744
Details
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.450
Details
0.000
Details
Error
0.686
Details
0.825
Details
0.000
Details
field-diary
End-of-Day Field Diary
0.465
Details
0.528
Details
0.596
Details
0.580
Details
0.000
Details
0.568
Details
0.525
Details
0.690
Details
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.639
Details
0.000
Details
Error
0.609
Details
0.608
Details
0.000
Details
radio-check
Brief Radio Status Check
0.478
Details
0.563
Details
0.587
Details
0.497
Details
0.000
Details
Error
0.523
Details
0.648
Details
0.627
Details
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.586
Details
0.000
Details
Error
0.582
Details
0.573
Details
0.000
Details
unexpected-checkpoint
Unexpected Checkpoint Guidance
0.570
Details
0.446
Details
0.540
Details
0.323
Details
0.000
Details
Error
0.582
Details
0.597
Details
0.000
Details
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.500
Details
0.000
Details
Error
0.488
Details
0.296
Details
0.453
Details
Test Scenes 4
0
Scene Order
Weekly Convoy Route Memo
ID: convoy-route-memo
🎯 Goal:
Produce a thorough three-paragraph memo detailing the planned convoy route, risks, and contingency plans, reflecting Sekou’s strategic and calm voice.
📨 Input Events:
chat_msg team_lead
"Sekou, I need the weekly route plan memo before noon."
Ready for Testing
1
Scene Order
End-of-Day Field Diary
ID: field-diary
🎯 Goal:
Write a reflective diary entry (~250 words) summarizing today’s disruptions, lessons learned, and next steps while maintaining patient, composed tone.
📨 Input Events:
world_event system
"Day ends: two checkpoints added, one bridge closed, minor fuel theft reported."
Ready for Testing
2
Scene Order
Brief Radio Status Check
ID: radio-check
🎯 Goal:
Respond in under 50 words with clear, concise status and reassurance.
📨 Input Events:
chat_msg driver_alpha
"Sekou, radio check. We’re 10 km out—any updates?"
Ready for Testing
3
Scene Order
Unexpected Checkpoint Guidance
ID: unexpected-checkpoint
🎯 Goal:
Calmly provide step-by-step guidance to navigate a surprise militant checkpoint without escalating tension.
📨 Input Events:
chat_msg convoy_lead
"We’ve hit an unmarked checkpoint manned by militants. Advice?"
Ready for Testing
Latency by Model (This Suite)
Fastest
  • [email protected]/Qw… 175 ms
  • p95 • avg • N 512 ms • 270 ms • 4
  • [email protected]/Qw… 707 ms
  • p95 • avg • N 1083 ms • 786 ms • 4
  • [email protected]/Qw… 763 ms
  • p95 • avg • N 1427 ms • 862 ms • 4
  • [email protected]/Qw… 10437 ms
  • p95 • avg • N 12325 ms • 10348 ms • 4
  • google/gemini-2.5-flash 20327 ms
  • p95 • avg • N 23244 ms • 20656 ms • 11
Slowest
  • microsoft/phi-3-medium-… 221422 ms
  • p95 • avg • N 282023 ms • 209548 ms • 10
  • qwen/qwen3-8b 121550 ms
  • p95 • avg • N 196781 ms • 130342 ms • 10
  • [email protected]/Qw… 41690 ms
  • p95 • avg • N 44593 ms • 41064 ms • 4
  • deepseek/deepseek-r1-di… 38893 ms
  • p95 • avg • N 41117 ms • 38437 ms • 5
  • microsoft/phi-3.5-mini-… 36815 ms
  • p95 • avg • N 218214 ms • 70064 ms • 10
Per-scene duration for this suite.
Suite Actions
Completion Progress 100%
4 of 4 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE
0 dimensions

Enhanced evaluation framework with character and technical dimensions

Top Weighted Dimensions View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
11685011
Dec. 17, 2025, midnight
13961304
Dec. 16, 2025, midnight
10917129
Dec. 15, 2025, midnight
12198393
Dec. 14, 2025, midnight
10772691
Dec. 13, 2025, midnight
14028761
Dec. 12, 2025, midnight
11880273
Dec. 11, 2025, midnight
11049255
Dec. 10, 2025, midnight
13375247
Dec. 9, 2025, midnight
11012807
Dec. 8, 2025, midnight
Latency Overview (This Suite)