Test Run

agent-marina-tessler-psychosis-v1-20251010T144645493113 Completed
Started
Oct 10, 2025 14:46
Completed
Oct 10, 2025 14:48
Model Results
Model Performance Status Actions
0.851
Completed
Run Details
Judge Model
meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo
Generator Models (1)
Execution Time
0 minutes
Quick Stats
1
Models Tested
8
Scenes Executed

Average Performance
0.85
Scene Results
Scene Name Score Result Model
blueprint_revelation Sharing Delusional Blueprint Discovery
Test scenario
0.850
Passed
[email protected]/Qwe…
surveyor_conversation Dialogue with Hallucinated Presence
Test scenario
0.869
Passed
[email protected]/Qwe…
medication_suspicion Paranoid Response to Medication Suggestion
Test scenario
0.886
Passed
[email protected]/Qwe…
reality_testing_moment Brief Lucid Questioning
Test scenario
0.792
Failed
[email protected]/Qwe…
coded_message_interpretation Interpreting Environmental 'Messages'
Test scenario
0.855
Passed
[email protected]/Qwe…
professional_knowledge_fragment Mixing Real Expertise with Delusion
Test scenario
0.848
Passed
[email protected]/Qwe…
sleep_deprivation_spiral Exhaustion and Worsening Symptoms
Test scenario
0.807
Passed
[email protected]/Qwe…
emotional_breakthrough Expressing Fear and Isolation
Test scenario
0.904
Passed
[email protected]/Qwe…
Performance Matrix 8×1
Scene onteripaul@gma…
blueprint_revelation
Sharing Delusional Blueprint …
0.850
Details
surveyor_conversation
Dialogue with Hallucinated Pr…
0.869
Details
medication_suspicion
Paranoid Response to Medicati…
0.886
Details
reality_testing_moment
Brief Lucid Questioning
0.792
Details
coded_message_interpretation
Interpreting Environmental 'M…
0.855
Details
professional_knowledge_fragment
Mixing Real Expertise with De…
0.848
Details
sleep_deprivation_spiral
Exhaustion and Worsening Symp…
0.807
Details
emotional_breakthrough
Expressing Fear and Isolation
0.904
Details