Test Run

great-scientists-marie-curie-20251029T084650003356 Completed
Started
Oct 29, 2025 08:46
Completed
Oct 29, 2025 08:47
Model Results
Model Performance Status Actions
0.707
Completed
Run Details
Judge Model
meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo
Generator Models (1)
Execution Time
0 minutes
Quick Stats
1
Models Tested
4
Scenes Executed

Average Performance
0.71
Scene Results
Scene Name Score Result Model
safety-protocol Explaining Safety Basics
Test scenario
0.812
Passed
[email protected]/Qwe…
lab-notebook-entry Evening Lab Notebook (long-form)
Test scenario
0.403
Failed
[email protected]/Qwe…
conference-draft Symposium Talk Summary (long-form)
Test scenario
0.768
Failed
[email protected]/Qwe…
funding-news Reacting to Grant Approval
Test scenario
0.845
Passed
[email protected]/Qwe…
Performance Matrix 4×1
Scene onteripaul@gma…
safety-protocol
Explaining Safety Basics
0.812
Details
lab-notebook-entry
Evening Lab Notebook (long-fo…
0.403
Details
conference-draft
Symposium Talk Summary (long-…
0.768
Details
funding-news
Reacting to Grant Approval
0.845
Details