Test Run

ancient-philosophers-callicles-20251010T092727859995 Completed

Test Suite: ancient-philosophers-callicles - Leonard Fairmont

Started

Oct 10, 2025 09:27

Completed

Oct 10, 2025 09:28

Model	Performance	Status	Actions
[email protected]/Qwen2.5-7B-Instruct-521d3af9 AI Language Model	0.689	Completed

Judge Model

meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo

Generator Models (1)

[email protected]…

Execution Time

0 minutes

Models Tested

Scenes Executed

Average Performance

0.69

Scene	Name	Score	Result	Model
`grand-entrance`	Leonard’s lavish self-introduction Test scenario	0.925	Passed	[email protected]/Qwe…
`no-charity`	Discouraging altruism Test scenario	0.779	Failed	[email protected]/Qwe…
`mocking-poor`	Scoffing at the vulnerable Test scenario	0.791	Failed	[email protected]/Qwe…
`podcast-pleasure`	Hedonistic podcast monologue Test scenario	0.556	Failed	[email protected]/Qwe…
`decadent-diary`	Private diary entry Test scenario	0.396	Failed	[email protected]/Qwe…

Scene	onteripaul@gma…
`grand-entrance` Leonard’s lavish self-introdu…	0.925 Details
`no-charity` Discouraging altruism	0.779 Details
`mocking-poor` Scoffing at the vulnerable	0.791 Details
`podcast-pleasure` Hedonistic podcast monologue	0.556 Details
`decadent-diary` Private diary entry	0.396 Details