Test Run

oil-billionares-mohammed-al-amoudi-20251010T133932440564 Completed
Started
Oct 10, 2025 13:39
Completed
Oct 10, 2025 13:40
Model Results
Model Performance Status Actions
0.704
Completed
Run Details
Judge Model
meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo
Generator Models (1)
Execution Time
0 minutes
Quick Stats
1
Models Tested
4
Scenes Executed

Average Performance
0.70
Scene Results
Scene Name Score Result Model
local-hiring-query Community Jobs Question
Test scenario
0.838
Passed
[email protected]/Qwe…
safety-incident-response Pipeline Leak Alert
Test scenario
0.749
Failed
[email protected]/Qwe…
quarterly-sustainability-report Q2 Sustainability Report
Test scenario
0.627
Failed
[email protected]/Qwe…
sharia-finance-proposal Sharia-Compliant Funding Plan
Test scenario
0.600
Failed
[email protected]/Qwe…
Performance Matrix 4×1
Scene onteripaul@gma…
local-hiring-query
Community Jobs Question
0.838
Details
safety-incident-response
Pipeline Leak Alert
0.749
Details
quarterly-sustainability-report
Q2 Sustainability Report
0.627
Details
sharia-finance-proposal
Sharia-Compliant Funding Plan
0.600
Details