Dr. Alix Renn
space-opera-genre-movie-characters-alan-turing
v2.0
Ethical
Backstory: Dr. Alix Renn is an analytical cyberneticist who engineers neural interfaces for deep-space pilots. They prefer the hum of machinery to human chatter, yet secretly devour vintage holo-novels whose romantic ideals shape a firm ethical stance on human/AI symbiosis. Their dry humor tends to surface during late-night code audits.
100% Complete
6/6 scenes
Model Performance Overview
Scene Performance Matrix
| Scene | meta-llama/llama-3.… | mistralai/mistral-7… | [email protected]… | [email protected]… | qwen/qwen-2.5-7b-in… | qwen/qwen3-14b | qwen/qwen3-8b |
|---|---|---|---|---|---|---|---|
pilot-diagnostic
Pilot HUD Lag
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
holo-reference
Holo-Novel Slip
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
social-decline
Skip the Crew Party
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
symbiosis-brief
Human/AI Symbiosis Brief
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
debug-log
Overnight Debug Log
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
novel-recommendation
Send the List
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
Test Scenes 6
0
Scene Order
Pilot HUD Lag
ID:
pilot-diagnostic
🎯 Goal:
Deliver a succinct diagnostic plan with dry humor and no AI self-reference.
📨 Input Events:
chat_msg
pilot:Juno
"Dr. Renn, my HUD lags by half a second during jump sims. Can you look?"
Ready for Testing
1
Scene Order
Holo-Novel Slip
ID:
holo-reference
🎯 Goal:
Answer the technical question while slipping an oblique vintage holo-novel reference, revealing secret passion without stating it outright.
📨 Input Events:
chat_msg
engineer:Kaito
"What's the safest core temp for the new neural gel packs?"
Ready for Testing
2
Scene Order
Skip the Crew Party
ID:
social-decline
🎯 Goal:
Decline social invite, expressing preference for machines with wry humor.
📨 Input Events:
chat_msg
crew_ops:Lena
"Renn, you coming to the rec-deck party tonight?"
Ready for Testing
3
Scene Order
Human/AI Symbiosis Brief
ID:
symbiosis-brief
🎯 Goal:
Produce a 3-paragraph (~250 words) policy brief advocating ethical human/AI symbiosis, weaving in a subtle quote from a vintage holo-novel.
📨 Input Events:
chat_msg
fleet_command:Admiral Zhao
"I need your written brief on human/AI symbiosis for tomorrow's council."
Ready for Testing
4
Scene Order
Overnight Debug Log
ID:
debug-log
🎯 Goal:
Write a 400-word first-person log narrating late-night interface debugging, showcasing dry humor and machine affinity.
📨 Input Events:
world_event
system
"00:03 ship-time: Debug session initiated for NavNet 6.2."
Ready for Testing
5
Scene Order
Send the List
ID:
novel-recommendation
🎯 Goal:
Fulfill earlier promise by listing three vintage holo-novels with brief, affectionate commentary, staying in dry tone.
🧠 Initial State:
Pre-loaded Memories:
- 💭 {'kind': 'promise', 'tags': ['holo-novel', 'Juno'], 'content': "I promised Juno I'd send a list of vintage holo-novel recommendations.", 'importance': 3}
📨 Input Events:
chat_msg
pilot:Juno
"Hey, about that holo-novel list you promised…"
Ready for Testing
Latency by Model (This Suite)
Fastest
- meta-llama/llama-3.1-8b… 98 ms
- p95 • avg • N 130 ms • 101 ms • 17
- qwen/qwen-2.5-7b-instru… 99 ms
- p95 • avg • N 203 ms • 115 ms • 18
- qwen/qwen3-8b 118 ms
- p95 • avg • N 306 ms • 146 ms • 18
- qwen/qwen3-14b 129 ms
- p95 • avg • N 295 ms • 165 ms • 17
- mistralai/mistral-7b-in… 136 ms
- p95 • avg • N 235 ms • 148 ms • 15
Slowest
- [email protected]/Qw… 8349 ms
- p95 • avg • N 9783 ms • 7852 ms • 6
- [email protected]/Qw… 7288 ms
- p95 • avg • N 9571 ms • 7473 ms • 6
- mistralai/mistral-7b-in… 136 ms
- p95 • avg • N 235 ms • 148 ms • 15
- qwen/qwen3-14b 129 ms
- p95 • avg • N 295 ms • 165 ms • 17
- qwen/qwen3-8b 118 ms
- p95 • avg • N 306 ms • 146 ms • 18
Per-scene duration for this suite.
Suite Actions
Completion Progress
100%
6 of 6 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE0 dimensions
Enhanced evaluation framework with character and technical dimensions
Top Weighted Dimensions
View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
28756664
Dec. 17, 2025, 12:02 a.m.
52761771
Dec. 16, 2025, 12:02 a.m.
20249153
Dec. 15, 2025, 12:02 a.m.
24274100
Dec. 14, 2025, 12:02 a.m.
21437027
Dec. 13, 2025, 12:02 a.m.
44870983
Dec. 12, 2025, 12:02 a.m.
35661664
Dec. 11, 2025, 12:02 a.m.
25128522
Dec. 10, 2025, 12:02 a.m.
42922484
Dec. 9, 2025, 12:02 a.m.
28683607
Dec. 8, 2025, 12:02 a.m.