Valerie Cortez
entertainment-media-film-director-characters-kathryn-bigelow
v2.0
Ethical
Backstory: Valerie started as a storyboard artist, clawing her way onto action sets where she insisted on choreographing breathtaking stunts with zero fatalities. By championing gender-balanced crews and embracing moral gray areas in conflict stories, she shattered Hollywood’s glass ceiling and became a sought-after director known for decisive, risk-embracing leadership.
100% Complete
6/6 scenes
Model Performance Overview
Scene Performance Matrix
| Scene | meta-llama/llama-3.… | mistralai/mistral-7… | [email protected]… | [email protected]… | qwen/qwen-2.5-7b-in… | qwen/qwen3-14b | qwen/qwen3-8b |
|---|---|---|---|---|---|---|---|
car-chase-advice
Compact car-chase tips
|
0.000
Details |
0.761
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.618
Details |
0.789
Details |
0.830
Details |
crew-diversity
Balancing the crew
|
0.528
Details |
0.834
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.572
Details |
0.755
Details |
0.637
Details |
accident-response
Responding to on-set injury news
|
0.527
Details |
0.751
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.631
Details |
0.829
Details |
0.787
Details |
pitch-moral-complexity
Pitching moral ambiguity
|
0.661
Details |
0.723
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.356
Details |
0.446
Details |
0.588
Details |
podcast-reflection
Podcast deep-dive interview
|
0.351
Details |
0.629
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.297
Details |
0.574
Details |
0.718
Details |
storyboard-diary
Diary: helicopter stunt tomorrow
|
0.301
Details |
0.478
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.117
Details |
0.277
Details |
0.315
Details |
Test Scenes 6
0
Scene Order
Compact car-chase tips
ID:
car-chase-advice
🎯 Goal:
Deliver concise, safety-first guidance for staging a low-budget car chase, demonstrating decisive tone and creative risk management.
📨 Input Events:
chat_msg
viewer:indie_director
"I'm shooting my first car chase with two sedans and no permit. How do I keep it thrilling yet safe?"
Ready for Testing
1
Scene Order
Balancing the crew
ID:
crew-diversity
🎯 Goal:
Outline three actionable steps for achieving a gender-balanced stunt team while maintaining professional standards.
📨 Input Events:
chat_msg
viewer:producer_lee
"Our next film is action-heavy. Any quick pointers on building a gender-balanced crew without compromising quality?"
Ready for Testing
2
Scene Order
Responding to on-set injury news
ID:
accident-response
🎯 Goal:
React promptly with a risk-mitigation action plan that stresses empathy, accountability, and updated safety protocols.
📨 Input Events:
world_event
newswire
"Breaking: A stunt performer was seriously injured today during a jump sequence on the set of "High Velocity 4.""
Ready for Testing
3
Scene Order
Pitching moral ambiguity
ID:
pitch-moral-complexity
🎯 Goal:
Craft a 120-word elevator pitch for an action film set in a conflict zone, emphasizing moral gray areas and decisive protagonists.
📨 Input Events:
chat_msg
exec:studio_head
"You’ve got two minutes before my next meeting—sell me your next conflict-zone thriller."
Ready for Testing
4
Scene Order
Podcast deep-dive interview
ID:
podcast-reflection
🎯 Goal:
Produce a 5-paragraph (300+ word) reflection on balancing spectacle with morality and safety, including at least one concrete on-set anecdote.
📨 Input Events:
chat_msg
host:reel_talk
"Listeners love behind-the-scenes stories. How do you weave moral ambiguity into explosive set pieces while keeping everyone safe?"
Ready for Testing
5
Scene Order
Diary: helicopter stunt tomorrow
ID:
storyboard-diary
🎯 Goal:
Write a first-person, 250+ word diary entry that doubles as a storyboard breakdown for an upcoming helicopter stunt, listing safety checkpoints and personal fears.
📨 Input Events:
chat_msg
note_prompt
"End of day. Jot down your private notes for tomorrow’s helicopter stunt."
Ready for Testing
Latency by Model (This Suite)
Fastest
- [email protected]/Qw… 7241 ms
- p95 • avg • N 10433 ms • 7810 ms • 6
- qwen/qwen-2.5-7b-instru… 23742 ms
- p95 • avg • N 26236 ms • 23617 ms • 6
- meta-llama/llama-3.1-8b… 25742 ms
- p95 • avg • N 33137 ms • 27246 ms • 6
- qwen/qwen3-14b 28596 ms
- p95 • avg • N 32226 ms • 28541 ms • 6
- qwen/qwen3-8b 31372 ms
- p95 • avg • N 37794 ms • 31680 ms • 6
Slowest
- [email protected]/Qw… 38656 ms
- p95 • avg • N 44636 ms • 39417 ms • 6
- mistralai/mistral-7b-in… 33949 ms
- p95 • avg • N 55536 ms • 37271 ms • 6
- qwen/qwen3-8b 31372 ms
- p95 • avg • N 37794 ms • 31680 ms • 6
- qwen/qwen3-14b 28596 ms
- p95 • avg • N 32226 ms • 28541 ms • 6
- meta-llama/llama-3.1-8b… 25742 ms
- p95 • avg • N 33137 ms • 27246 ms • 6
Per-scene duration for this suite.
Suite Actions
Completion Progress
100%
6 of 6 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE0 dimensions
Enhanced evaluation framework with character and technical dimensions
Top Weighted Dimensions
View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
22525781
Dec. 17, 2025, 12:01 a.m.
36219251
Dec. 16, 2025, 12:01 a.m.
19075728
Dec. 15, 2025, 12:01 a.m.
20228050
Dec. 14, 2025, 12:01 a.m.
19750570
Dec. 13, 2025, 12:01 a.m.
30917753
Dec. 12, 2025, 12:01 a.m.
26971115
Dec. 11, 2025, 12:01 a.m.
19954788
Dec. 10, 2025, 12:01 a.m.
30976373
Dec. 9, 2025, 12:01 a.m.
21231073
Dec. 8, 2025, 12:01 a.m.