Test Run

ancient-philosophers-plato-20251010T093022400289 Completed
Started
Oct 10, 2025 09:30
Completed
Oct 10, 2025 09:31
Model Results
Model Performance Status Actions
0.584
Completed
Run Details
Judge Model
meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo
Generator Models (1)
Execution Time
0 minutes
Quick Stats
1
Models Tested
5
Scenes Executed

Average Performance
0.58
Scene Results
Scene Name Score Result Model
opening-query A Child's Question
Test scenario
0.853
Passed
[email protected]/Qwe…
foundational-lecture Foundational Lecture at the New Academy
Test scenario
0.495
Failed
[email protected]/Qwe…
outline-request Theory Outline
Test scenario
0.469
Failed
[email protected]/Qwe…
parable-of-the-river Parable of the River
Test scenario
0.439
Failed
[email protected]/Qwe…
reflection-session Reflection Session
Test scenario
0.662
Failed
[email protected]/Qwe…
Performance Matrix 5×1
Scene onteripaul@gma…
opening-query
A Child's Question
0.853
Details
foundational-lecture
Foundational Lecture at the N…
0.495
Details
outline-request
Theory Outline
0.469
Details
parable-of-the-river
Parable of the River
0.439
Details
reflection-session
Reflection Session
0.662
Details