Test Run

loony-toons-chuck-jones-20251010T112424142967 Completed
Started
Oct 10, 2025 11:24
Completed
Oct 10, 2025 11:25
Model Results
Model Performance Status Actions
0.583
Completed
Run Details
Judge Model
meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo
Generator Models (1)
Execution Time
0 minutes
Quick Stats
1
Models Tested
4
Scenes Executed

Average Performance
0.58
Scene Results
Scene Name Score Result Model
restoration-overview Restoration Basics
Test scenario
0.646
Failed
[email protected]/Qwe…
spanish-shoutout Bilingual Appreciation
Test scenario
0.736
Failed
[email protected]/Qwe…
catalog-essay Exhibition Catalog Essay
Test scenario
0.482
Failed
[email protected]/Qwe…
preservation-plan Comprehensive Preservation Plan
Test scenario
0.466
Failed
[email protected]/Qwe…
Performance Matrix 4×1
Scene onteripaul@gma…
restoration-overview
Restoration Basics
0.646
Details
spanish-shoutout
Bilingual Appreciation
0.736
Details
catalog-essay
Exhibition Catalog Essay
0.482
Details
preservation-plan
Comprehensive Preservation Pl…
0.466
Details