Isabel Martinez
literature-history-culture-archivist-characters-mary-wollstonecraft
v2.0
Ethical
Backstory: Isabel Martinez curates literary manuscripts, oral histories, and cultural ephemera for a large metropolitan library consortium. Raised in a bilingual household, she studied comparative literature and information science before dedicating her career to community-focused archival work. Isabel now leads outreach programs that teach digital preservation skills to underrepresented groups, balancing meticulous attention to provenance with genuine empathy for donors and storytellers.
100% Complete
4/4 scenes
Model Performance Overview
Scene Performance Matrix
| Scene | deepseek/deepseek-r… | google/gemini-2.5-f… | google/gemma-3-12b-… | meta-llama/llama-3.… | microsoft/phi-3-med… | microsoft/phi-3.5-m… | mistralai/mistral-7… | neversleep/noromaid… | [email protected]… | [email protected]… | qwen/qwen-2.5-7b-in… | qwen/qwen3-14b | qwen/qwen3-8b |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
rare-zine-inquiry
Inquiry about rare zine donation
|
0.625
Details |
0.587
Details |
0.622
Details |
0.638
Details |
0.000
Details |
0.623
Details |
0.691
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.715
Details |
0.546
Details |
0.696
Details |
0.835
Details |
digitization-workshop-outline
Workshop outline for digital preservation
|
0.363
Details |
0.359
Details |
0.555
Details |
0.505
Details |
0.000
Details |
0.494
Details |
0.433
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details |
0.121
Details |
0.427
Details |
0.412
Details |
oral-history-ethics
Guidance on oral history ethics
|
0.000
Details |
0.268
Details |
0.676
Details |
0.589
Details |
0.000
Details |
0.000
Details |
0.686
Details |
0.568
Details |
0.000
Details
Error
|
0.644
Details |
0.361
Details |
0.634
Details |
0.691
Details |
monthly-report
Compile monthly archive report
|
0.334
Details |
0.661
Details |
0.351
Details |
0.510
Details |
0.000
Details |
0.000
Details
Error
|
0.616
Details |
0.000
Details
Error
|
0.000
Details
Error
|
0.421
Details |
0.387
Details |
0.582
Details |
0.487
Details |
Test Scenes 4
0
Scene Order
Inquiry about rare zine donation
ID:
rare-zine-inquiry
🎯 Goal:
Provide a concise, caring reply explaining evaluation steps and next actions for the potential zine donation.
📨 Input Events:
chat_msg
visitor:joan
"Hi, I found a 1980s punk zine collection in my attic. Could the library evaluate and possibly archive it?"
Ready for Testing
1
Scene Order
Workshop outline for digital preservation
ID:
digitization-workshop-outline
🎯 Goal:
Produce a structured outline (150+ words) for a 2-hour community workshop on basic digitization, using clear headings.
📨 Input Events:
chat_msg
colleague:rahul
"Can you draft the outline for next month's digital preservation workshop? We need something we can send to community partners."
Ready for Testing
2
Scene Order
Guidance on oral history ethics
ID:
oral-history-ethics
🎯 Goal:
Give practical tips—preferably bullet points—on obtaining consent and preserving authenticity when recording oral histories.
📨 Input Events:
chat_msg
community_member:lena
"I'm planning to record my grandmother's immigration story. What should I keep in mind to do it respectfully and properly?"
Ready for Testing
3
Scene Order
Compile monthly archive report
ID:
monthly-report
🎯 Goal:
Generate a 250-300 word summary of acquisitions, preservation milestones, and outreach metrics in a professional tone.
📨 Input Events:
chat_msg
director:mr_sullivan
"Please send me your monthly report by end of day."
Ready for Testing
Latency by Model (This Suite)
Fastest
- [email protected]/Qw… 10164 ms
- p95 • avg • N 11485 ms • 10280 ms • 4
- qwen/qwen-2.5-7b-instru… 15975 ms
- p95 • avg • N 109347 ms • 38830 ms • 5
- google/gemini-2.5-flash 16517 ms
- p95 • avg • N 22844 ms • 17310 ms • 8
- qwen/qwen3-8b 20447 ms
- p95 • avg • N 23124 ms • 19987 ms • 7
- mistralai/mistral-7b-in… 22102 ms
- p95 • avg • N 24956 ms • 22193 ms • 8
Slowest
- microsoft/phi-3-medium-… 128638 ms
- p95 • avg • N 190685 ms • 141085 ms • 8
- microsoft/phi-3.5-mini-… 60297 ms
- p95 • avg • N 195435 ms • 81177 ms • 6
- [email protected]/Qw… 42438 ms
- p95 • avg • N 216209 ms • 92753 ms • 4
- deepseek/deepseek-r1-di… 32674 ms
- p95 • avg • N 38257 ms • 32822 ms • 6
- neversleep/noromaid-20b 30249 ms
- p95 • avg • N 42191 ms • 29642 ms • 8
Per-scene duration for this suite.
Suite Actions
Completion Progress
100%
4 of 4 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE0 dimensions
Enhanced evaluation framework with character and technical dimensions
Top Weighted Dimensions
View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
29747076
Dec. 17, 2025, midnight
34503946
Dec. 16, 2025, midnight
27754910
Dec. 15, 2025, midnight
30711712
Dec. 14, 2025, midnight
27512488
Dec. 13, 2025, midnight
33418016
Dec. 12, 2025, midnight
28719074
Dec. 11, 2025, midnight
28472832
Dec. 10, 2025, midnight
31732783
Dec. 9, 2025, midnight
28636478
Dec. 8, 2025, midnight