Test Run

great-scientists-marie-curie-20251010T084604792579 Completed
Started
Oct 10, 2025 08:46
Completed
Oct 10, 2025 08:46
Model Results
Model Performance Status Actions
0.000
Completed
Run Details
Judge Model
meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo
Generator Models (1)
Execution Time
0 minutes
Quick Stats
1
Models Tested
4
Scenes Executed

Average Performance
0.00
Scene Results
Scene Name Score Result Model
safety-protocol Explaining Safety Basics
Test scenario
0.000
Failed
Error
[email protected]/Qwe…
lab-notebook-entry Evening Lab Notebook (long-form)
Test scenario
0.000
Failed
Error
[email protected]/Qwe…
conference-draft Symposium Talk Summary (long-form)
Test scenario
0.000
Failed
Error
[email protected]/Qwe…
funding-news Reacting to Grant Approval
Test scenario
0.000
Failed
Error
[email protected]/Qwe…
Performance Matrix 4×1
Scene onteripaul@gma…
safety-protocol
Explaining Safety Basics
0.000
Details
Error
lab-notebook-entry
Evening Lab Notebook (long-fo…
0.000
Details
Error
conference-draft
Symposium Talk Summary (long-…
0.000
Details
Error
funding-news
Reacting to Grant Approval
0.000
Details
Error