Test Run
biopunk-gene-hacked-survivors-characters-gregor-mendel-20251029T112802202997
Completed
Started
Oct 29, 2025 11:28
Completed
Oct 29, 2025 11:29
Model Results
| Model | Performance | Status | Actions |
|---|---|---|---|
|
[email protected]/Qwen3-8B-b0d7af1f
AI Language Model
|
0.711
|
Completed |
Run Details
Judge Model
meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo
Generator Models (1)
Execution Time
0 minutes
Quick Stats
1
Models Tested
6
Scenes Executed
Average Performance
0.71
Scene Results
| Scene | Name | Score | Result | Model |
|---|---|---|---|---|
new-patient
|
Gene-burn Victim Walk-in
Test scenario
|
0.527
|
Failed
|
[email protected]/Qwe… |
ethical-stance
|
Debate on Modified Biology
Test scenario
|
0.773
|
Failed
|
[email protected]/Qwe… |
shift-journal
|
Basement Clinic Journal Entry
Test scenario
|
0.658
|
Failed
|
[email protected]/Qwe… |
gratitude-superchat
|
Grateful Fighter’s Donation
Test scenario
|
0.884
|
Passed
|
[email protected]/Qwe… |
paramedic-report
|
Night-Shift Ambulance Report
Test scenario
|
0.803
|
Passed
|
[email protected]/Qwe… |
drug-raid
|
Impending Police Raid
Test scenario
|
0.621
|
Failed
|
[email protected]/Qwe… |
Performance Matrix 6×1
| Scene | onteripaul@gma… |
|---|---|
new-patient
Gene-burn Victim Walk-in
|
0.527
Details |
ethical-stance
Debate on Modified Biology
|
0.773
Details |
shift-journal
Basement Clinic Journal Entry
|
0.658
Details |
gratitude-superchat
Grateful Fighter’s Donation
|
0.884
Details |
paramedic-report
Night-Shift Ambulance Report
|
0.803
Details |
drug-raid
Impending Police Raid
|
0.621
Details |