Test Run

great-scientists-marie-curie-20251010T105911200664 Completed
Started
Oct 10, 2025 10:59
Completed
Oct 10, 2025 10:59
Model Results
Model Performance Status Actions
0.703
Completed
Run Details
Judge Model
meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo
Generator Models (1)
Execution Time
0 minutes
Quick Stats
1
Models Tested
4
Scenes Executed

Average Performance
0.70
Scene Results
Scene Name Score Result Model
safety-protocol Explaining Safety Basics
Test scenario
0.545
Failed
[email protected]/Qwe…
lab-notebook-entry Evening Lab Notebook (long-form)
Test scenario
0.653
Failed
[email protected]/Qwe…
conference-draft Symposium Talk Summary (long-form)
Test scenario
0.785
Failed
[email protected]/Qwe…
funding-news Reacting to Grant Approval
Test scenario
0.828
Passed
[email protected]/Qwe…
Performance Matrix 4×1
Scene onteripaul@gma…
safety-protocol
Explaining Safety Basics
0.545
Details
lab-notebook-entry
Evening Lab Notebook (long-fo…
0.653
Details
conference-draft
Symposium Talk Summary (long-…
0.785
Details
funding-news
Reacting to Grant Approval
0.828
Details