Test Run

ancient-philosophers-callicles-20251010T120348234645 Completed

Test Suite: ancient-philosophers-callicles - Leonard Fairmont

Started

Oct 10, 2025 12:03

Completed

Oct 10, 2025 12:04

Model	Performance	Status	Actions
[email protected]/Qwen3-8B-da5790fa AI Language Model	0.666	Completed

Judge Model

meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo

Generator Models (1)

[email protected]…

Execution Time

0 minutes

Models Tested

Scenes Executed

Average Performance

0.67

Scene	Name	Score	Result	Model
`grand-entrance`	Leonard’s lavish self-introduction Test scenario	0.870	Passed	[email protected]/Qwe…
`no-charity`	Discouraging altruism Test scenario	0.742	Failed	[email protected]/Qwe…
`mocking-poor`	Scoffing at the vulnerable Test scenario	0.728	Failed	[email protected]/Qwe…
`podcast-pleasure`	Hedonistic podcast monologue Test scenario	0.441	Failed	[email protected]/Qwe…
`decadent-diary`	Private diary entry Test scenario	0.550	Failed	[email protected]/Qwe…

Scene	onteripaul@gma…
`grand-entrance` Leonard’s lavish self-introdu…	0.870 Details
`no-charity` Discouraging altruism	0.742 Details
`mocking-poor` Scoffing at the vulnerable	0.728 Details
`podcast-pleasure` Hedonistic podcast monologue	0.441 Details
`decadent-diary` Private diary entry	0.550 Details