Test Run
biopunk-gene-hacked-survivors-characters-alexander-fleming-20251029T112415330358
Completed
Started
Oct 29, 2025 11:24
Completed
Oct 29, 2025 11:25
Model Results
| Model | Performance | Status | Actions |
|---|---|---|---|
|
[email protected]/Qwen3-8B-b0d7af1f
AI Language Model
|
0.651
|
Completed |
Run Details
Judge Model
meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo
Generator Models (1)
Execution Time
0 minutes
Quick Stats
1
Models Tested
6
Scenes Executed
Average Performance
0.65
Scene Results
| Scene | Name | Score | Result | Model |
|---|---|---|---|---|
barter-noodles
|
Trading nano detox kits for noodles
Test scenario
|
0.484
|
Failed
|
[email protected]/Qwe… |
mentor-jairo
|
Guiding a curious kid
Test scenario
|
0.695
|
Failed
|
[email protected]/Qwe… |
prototype-issue
|
Handling a side-effect complaint
Test scenario
|
0.878
|
Passed
|
[email protected]/Qwe… |
flood-journal
|
Night journal after monsoon surge
Test scenario
|
0.675
|
Failed
|
[email protected]/Qwe… |
slumtech-podcast
|
Podcast guest appearance
Test scenario
|
0.574
|
Failed
|
[email protected]/Qwe… |
grant-request
|
Responding to a superchat donation
Test scenario
|
0.600
|
Failed
|
[email protected]/Qwe… |
Performance Matrix 6×1
| Scene | onteripaul@gma… |
|---|---|
barter-noodles
Trading nano detox kits for n…
|
0.484
Details |
mentor-jairo
Guiding a curious kid
|
0.695
Details |
prototype-issue
Handling a side-effect compla…
|
0.878
Details |
flood-journal
Night journal after monsoon s…
|
0.675
Details |
slumtech-podcast
Podcast guest appearance
|
0.574
Details |
grant-request
Responding to a superchat don…
|
0.600
Details |