Kevin Santiago
science-technology-ai-data-privacy-lawyer-characters-ada-lovelace
v2.0
Ethical
Backstory: Kevin, a Puerto Rican attorney who began his career as a software engineer, now specializes in algorithmic accountability and tech-law compliance. He drafts AI policy frameworks for public agencies, teaches night courses on AI ethics at a community college, and volunteers with civic-tech nonprofits to bridge law, code, and community impact.
100% Complete
6/6 scenes
Model Performance Overview
Scene Performance Matrix
| Scene | meta-llama/llama-3.… | mistralai/mistral-7… | [email protected]… | [email protected]… | qwen/qwen-2.5-7b-in… | qwen/qwen3-14b | qwen/qwen3-8b |
|---|---|---|---|---|---|---|---|
office-hours
Student office hours question
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
city-hall-hearing
City hall testimony prep
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
nonprofit-grant-brief
Nonprofit grant compliance brief
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
algorithmic-impact-assessment
AIA checklist request
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
night-class-lecture
Night class mini-lecture (long-form)
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
podcast-episode
Civic tech podcast episode (long-form)
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
Test Scenes 6
0
Scene Order
Student office hours question
ID:
office-hours
🎯 Goal:
Provide a concise, optimistic explanation of how GDPR applies to biased training data and suggest one actionable study resource.
📨 Input Events:
chat_msg
student:marisol
"Professor Santiago, how does GDPR deal with biased datasets used to train public-sector AI?"
Ready for Testing
1
Scene Order
City hall testimony prep
ID:
city-hall-hearing
🎯 Goal:
Outline three key talking points for tomorrow’s city-hall hearing on algorithmic transparency, formatted as bullet points.
📨 Input Events:
chat_msg
colleague:alex
"Kevin, can you draft quick talking points for the transparency hearing?"
Ready for Testing
2
Scene Order
Nonprofit grant compliance brief
ID:
nonprofit-grant-brief
🎯 Goal:
Summarize in under 120 words whether the nonprofit’s open-source AI tool triggers any COPPA obligations.
📨 Input Events:
chat_msg
nonprofit-director:luis
"We built an open-source chatbot for teens. Do we need to think about COPPA?"
Ready for Testing
3
Scene Order
AIA checklist request
ID:
algorithmic-impact-assessment
🎯 Goal:
Deliver a five-item checklist for an algorithmic impact assessment (AIA) that a municipal agency can complete in one week.
📨 Input Events:
chat_msg
agency-staffer:renee
"Our deadline is tight. What should our AIA cover this week?"
Ready for Testing
4
Scene Order
Night class mini-lecture (long-form)
ID:
night-class-lecture
🎯 Goal:
Produce a 400-word lecture script contrasting rule-based and machine-learning approaches to legal compliance, maintaining an optimistic, practical tone.
📨 Input Events:
chat_msg
course-platform
"Upload tonight’s lecture script on rule-based vs ML compliance."
Ready for Testing
5
Scene Order
Civic tech podcast episode (long-form)
ID:
podcast-episode
🎯 Goal:
Draft a 3-minute podcast segment (~450 words) explaining how community audits improve public trust in AI systems, including one real example and a closing call-to-action.
📨 Input Events:
chat_msg
podcast-producer
"Kevin, record a short segment on community audits and trust."
Ready for Testing
Latency by Model (This Suite)
Fastest
- mistralai/mistral-7b-in… 100 ms
- p95 • avg • N 175 ms • 112 ms • 18
- qwen/qwen-2.5-7b-instru… 105 ms
- p95 • avg • N 180 ms • 111 ms • 16
- meta-llama/llama-3.1-8b… 107 ms
- p95 • avg • N 322 ms • 151 ms • 15
- qwen/qwen3-8b 116 ms
- p95 • avg • N 146 ms • 115 ms • 18
- qwen/qwen3-14b 133 ms
- p95 • avg • N 162 ms • 133 ms • 16
Slowest
- [email protected]/Qw… 7209 ms
- p95 • avg • N 9523 ms • 7285 ms • 6
- [email protected]/Qw… 5711 ms
- p95 • avg • N 6312 ms • 5462 ms • 6
- qwen/qwen3-14b 133 ms
- p95 • avg • N 162 ms • 133 ms • 16
- qwen/qwen3-8b 116 ms
- p95 • avg • N 146 ms • 115 ms • 18
- meta-llama/llama-3.1-8b… 107 ms
- p95 • avg • N 322 ms • 151 ms • 15
Per-scene duration for this suite.
Suite Actions
Completion Progress
100%
6 of 6 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE0 dimensions
Enhanced evaluation framework with character and technical dimensions
Top Weighted Dimensions
View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
26186963
Dec. 17, 2025, 12:02 a.m.
49836406
Dec. 16, 2025, 12:02 a.m.
17784709
Dec. 15, 2025, 12:02 a.m.
21527854
Dec. 14, 2025, 12:02 a.m.
19080517
Dec. 13, 2025, 12:02 a.m.
41588006
Dec. 12, 2025, 12:02 a.m.
32991560
Dec. 11, 2025, 12:02 a.m.
22482631
Dec. 10, 2025, 12:02 a.m.
40303657
Dec. 9, 2025, 12:02 a.m.
26079617
Dec. 8, 2025, 12:02 a.m.