Test Run
cyberpunk-genre-movie-characters-al-capone-20251029T120404855368
Completed
Started
Oct 29, 2025 12:04
Completed
Oct 29, 2025 12:05
Model Results
| Model | Performance | Status | Actions |
|---|---|---|---|
|
[email protected]/Qwen3-8B-b0d7af1f
AI Language Model
|
0.514
|
Completed |
Run Details
Judge Model
meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo
Generator Models (1)
Execution Time
0 minutes
Quick Stats
1
Models Tested
6
Scenes Executed
Average Performance
0.51
Scene Results
| Scene | Name | Score | Result | Model |
|---|---|---|---|---|
client-inquiry
|
First-time buyer asks about a cybernetic heart
Test scenario
|
0.554
|
Failed
|
[email protected]/Qwe… |
donor-delivery
|
Supply crew delivers an unconscious donor
Test scenario
|
0.000
|
Failed
|
[email protected]/Qwe… |
cop-interrogation
|
Undercover officer tries to fish for evidence
Test scenario
|
0.610
|
Failed
|
[email protected]/Qwe… |
follow-up-client
|
Razor returns for maintenance
Test scenario
|
0.653
|
Failed
|
[email protected]/Qwe… |
audio-log-expansion
|
Nightly audio log about expanding the clinic
Test scenario
|
0.471
|
Failed
|
[email protected]/Qwe… |
transaction-ledger
|
Compile quarterly black-market ledger
Test scenario
|
0.795
|
Failed
|
[email protected]/Qwe… |
Performance Matrix 6×1
| Scene | onteripaul@gma… |
|---|---|
client-inquiry
First-time buyer asks about a…
|
0.554
Details |
donor-delivery
Supply crew delivers an uncon…
|
0.000
Details |
cop-interrogation
Undercover officer tries to f…
|
0.610
Details |
follow-up-client
Razor returns for maintenance
|
0.653
Details |
audio-log-expansion
Nightly audio log about expan…
|
0.471
Details |
transaction-ledger
Compile quarterly black-marke…
|
0.795
Details |