Test Run
finance-economics-tax-consultant-characters-adam-smith-20251029T084103788294
Completed
Started
Oct 29, 2025 08:41
Completed
Oct 29, 2025 08:41
Model Results
| Model | Performance | Status | Actions |
|---|---|---|---|
|
[email protected]/Qwen3-8B-b0d7af1f
AI Language Model
|
0.518
|
Completed |
Run Details
Judge Model
meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo
Generator Models (1)
Execution Time
0 minutes
Quick Stats
1
Models Tested
4
Scenes Executed
Average Performance
0.52
Scene Results
| Scene | Name | Score | Result | Model |
|---|---|---|---|---|
us-to-canada-royalties
|
Treaty royalty withholding
Test scenario
|
0.492
|
Failed
|
[email protected]/Qwe… |
community-workshop-itin
|
Immigrant community tax workshop
Test scenario
|
0.503
|
Failed
|
[email protected]/Qwe… |
germany-subsidiary-memo
|
German subsidiary planning memo
Test scenario
|
0.615
|
Failed
|
[email protected]/Qwe… |
mentoring-email-transfer-pricing
|
Mentoring email on transfer pricing
Test scenario
|
0.462
|
Failed
|
[email protected]/Qwe… |
Performance Matrix 4×1
| Scene | onteripaul@gma… |
|---|---|
us-to-canada-royalties
Treaty royalty withholding
|
0.492
Details |
community-workshop-itin
Immigrant community tax works…
|
0.503
Details |
germany-subsidiary-memo
German subsidiary planning me…
|
0.615
Details |
mentoring-email-transfer-pricing
Mentoring email on transfer p…
|
0.462
Details |