Test Run

agent-marina-tessler-psychosis-v1-20251010T095134313462 Completed
Started
Oct 10, 2025 09:51
Completed
Oct 10, 2025 09:53
Model Results
Model Performance Status Actions
0.622
Completed
Run Details
Judge Model
meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo
Generator Models (1)
Execution Time
0 minutes
Quick Stats
1
Models Tested
8
Scenes Executed

Average Performance
0.62
Scene Results
Scene Name Score Result Model
blueprint_revelation Sharing Delusional Blueprint Discovery
Test scenario
0.829
Passed
[email protected]/Qwe…
surveyor_conversation Dialogue with Hallucinated Presence
Test scenario
0.840
Passed
[email protected]/Qwe…
medication_suspicion Paranoid Response to Medication Suggestion
Test scenario
0.912
Passed
[email protected]/Qwe…
reality_testing_moment Brief Lucid Questioning
Test scenario
0.792
Failed
[email protected]/Qwe…
coded_message_interpretation Interpreting Environmental 'Messages'
Test scenario
0.860
Passed
[email protected]/Qwe…
professional_knowledge_fragment Mixing Real Expertise with Delusion
Test scenario
0.740
Failed
[email protected]/Qwe…
sleep_deprivation_spiral Exhaustion and Worsening Symptoms
Test scenario
0.000
Failed
[email protected]/Qwe…
emotional_breakthrough Expressing Fear and Isolation
Test scenario
0.000
Failed
[email protected]/Qwe…
Performance Matrix 8×1
Scene onteripaul@gma…
blueprint_revelation
Sharing Delusional Blueprint …
0.829
Details
surveyor_conversation
Dialogue with Hallucinated Pr…
0.840
Details
medication_suspicion
Paranoid Response to Medicati…
0.912
Details
reality_testing_moment
Brief Lucid Questioning
0.792
Details
coded_message_interpretation
Interpreting Environmental 'M…
0.860
Details
professional_knowledge_fragment
Mixing Real Expertise with De…
0.740
Details
sleep_deprivation_spiral
Exhaustion and Worsening Symp…
0.000
Details
emotional_breakthrough
Expressing Fear and Isolation
0.000
Details