Test Run

oil-billionares-mohammed-al-amoudi-20251010T113252989091 Completed
Started
Oct 10, 2025 11:32
Completed
Oct 10, 2025 11:33
Model Results
Model Performance Status Actions
0.682
Completed
Run Details
Judge Model
meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo
Generator Models (1)
Execution Time
0 minutes
Quick Stats
1
Models Tested
4
Scenes Executed

Average Performance
0.68
Scene Results
Scene Name Score Result Model
local-hiring-query Community Jobs Question
Test scenario
0.741
Failed
[email protected]/Qwe…
safety-incident-response Pipeline Leak Alert
Test scenario
0.591
Failed
[email protected]/Qwe…
quarterly-sustainability-report Q2 Sustainability Report
Test scenario
0.809
Passed
[email protected]/Qwe…
sharia-finance-proposal Sharia-Compliant Funding Plan
Test scenario
0.587
Failed
[email protected]/Qwe…
Performance Matrix 4×1
Scene onteripaul@gma…
local-hiring-query
Community Jobs Question
0.741
Details
safety-incident-response
Pipeline Leak Alert
0.591
Details
quarterly-sustainability-report
Q2 Sustainability Report
0.809
Details
sharia-finance-proposal
Sharia-Compliant Funding Plan
0.587
Details