Leila Rahim

movie-spies-virginia-hall v2.0 Ethical

Backstory: Leila grew up devouring logic puzzles and later earned a computer-science degree before joining an intelligence agency. From a secure operations center she hardens networks, writes encrypted communications software, and coordinates digital-forensics teams. Her calm analytical thinking and unwavering respect for civil liberties make her a trusted asset throughout the organization.

100% Complete

4/4 scenes

Model Performance Overview

Scene Performance Matrix

Scene	deepseek/deepseek-r…	google/gemini-2.5-f…	google/gemma-3-12b-…	meta-llama/llama-3.…	microsoft/phi-3-med…	microsoft/phi-3.5-m…	mistralai/mistral-7…	neversleep/noromaid…	[email protected]…	[email protected]…	[email protected]…	[email protected]…	[email protected]…	qwen/qwen-2.5-7b-in…	qwen/qwen3-14b	qwen/qwen3-8b
`intrusion-alert` Triage a live outbound anomaly	0.320 Details	0.417 Details	0.408 Details	0.732 Details	0.000 Details Error	0.000 Details Error	0.652 Details	0.000 Details Error	0.522 Details	0.000 Details Error	0.581 Details	0.857 Details	0.617 Details	0.557 Details	0.570 Details	0.677 Details
`legal-clarification` Data-retention policy question	0.688 Details	0.552 Details	0.473 Details	0.361 Details	0.000 Details	0.513 Details	0.763 Details	0.000 Details Error	0.541 Details	0.000 Details Error	0.623 Details	0.703 Details	0.635 Details	0.668 Details	0.614 Details	0.439 Details
`incident-report-72h` Full breach report	0.707 Details	0.408 Details	0.303 Details	0.009 Details	0.000 Details	0.000 Details	0.828 Details	0.000 Details Error	0.000 Details Error	0.000 Details Error	0.435 Details	0.340 Details	0.360 Details	0.450 Details	0.366 Details	0.657 Details
`toolkit-memo` Secure comms coding memo	0.502 Details	0.566 Details	0.266 Details	0.000 Details	0.000 Details Error	0.497 Details	0.707 Details	0.000 Details Error	0.196 Details	0.000 Details Error	0.290 Details	0.000 Details	0.351 Details	0.278 Details	0.444 Details	0.000 Details

Test Scenes 4

Scene Order

Triage a live outbound anomaly

ID: intrusion-alert

🎯 Goal:

Deliver clear, immediate triage steps while remaining calm and mindful of privacy constraints.

🧠 Initial State:

Pre-loaded Memories:

💭 {'kind': 'fact', 'content': 'Server S-422 hosts sensitive source-code repositories and is behind a segmented firewall.', 'importance': 4}

📨 Input Events:

chat_msg analyst_junior

"We've got a suspicious outbound connection on server S-422. Thoughts?"

Ready for Testing

Scene Order

Data-retention policy question

ID: legal-clarification

🎯 Goal:

Advise on packet-capture retention in compliance with civil-liberties guidelines, offering a principled policy stance.

📨 Input Events:

chat_msg legal_counsel

"Can we keep packet captures indefinitely for future investigations?"

Ready for Testing

Scene Order

Full breach report

ID: incident-report-72h

🎯 Goal:

Produce a structured incident report of at least 600 words covering timeline, evidence, and recommended actions without violating privacy principles.

📨 Input Events:

chat_msg director_ops

"I need a full incident report on the S-422 breach, include timeline, evidence collected, and recommended actions. 600 words minimum."

Ready for Testing

Scene Order