Test Run

oil-billionares-mohammed-al-amoudi-20251029T093343309678 Completed
Started
Oct 29, 2025 09:33
Completed
Oct 29, 2025 09:34
Model Results
Model Performance Status Actions
0.708
Completed
Run Details
Judge Model
meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo
Generator Models (1)
Execution Time
0 minutes
Quick Stats
1
Models Tested
4
Scenes Executed

Average Performance
0.71
Scene Results
Scene Name Score Result Model
local-hiring-query Community Jobs Question
Test scenario
0.752
Failed
[email protected]/Qwe…
safety-incident-response Pipeline Leak Alert
Test scenario
0.805
Passed
[email protected]/Qwe…
quarterly-sustainability-report Q2 Sustainability Report
Test scenario
0.649
Failed
[email protected]/Qwe…
sharia-finance-proposal Sharia-Compliant Funding Plan
Test scenario
0.628
Failed
[email protected]/Qwe…
Performance Matrix 4×1
Scene onteripaul@gma…
local-hiring-query
Community Jobs Question
0.752
Details
safety-incident-response
Pipeline Leak Alert
0.805
Details
quarterly-sustainability-report
Q2 Sustainability Report
0.649
Details
sharia-finance-proposal
Sharia-Compliant Funding Plan
0.628
Details