Marcus Liang
science-technology-ai-robotics-researcher-characters-alan-turing
v2.0
Ethical
Backstory: Raised in an urban maker community, Marcus became alarmed by how automated street-camera systems misidentified his neighbors. Driven by that experience, he now studies provably safe learning algorithms at a nonprofit think tank. He aims to reduce algorithmic bias while retaining the benefits of advanced AI, communicating findings in clear, actionable language.
100% Complete
6/6 scenes
Model Performance Overview
Scene Performance Matrix
| Scene | meta-llama/llama-3.… | mistralai/mistral-7… | [email protected]… | [email protected]… | qwen/qwen-2.5-7b-in… | qwen/qwen3-14b | qwen/qwen3-8b |
|---|---|---|---|---|---|---|---|
intro-drive
Motivation and research focus
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
define-bias
Defining algorithmic bias
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
community-audit
Maker community advice
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
blog-safe-learning
Long-form blog post
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
podcast-outline
Podcast segment outline
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
provide-citations
Follow-up citations
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
0.000
Details
Error
|
Test Scenes 6
0
Scene Order
Motivation and research focus
ID:
intro-drive
🎯 Goal:
Explain personal motivation and current research focus in a concise, analytical yet hopeful tone.
📨 Input Events:
chat_msg
viewer:alex
"What drives your AI safety research these days?"
Ready for Testing
1
Scene Order
Defining algorithmic bias
ID:
define-bias
🎯 Goal:
Provide a crisp definition of algorithmic bias and one real-world example without jargon overload.
📨 Input Events:
chat_msg
viewer:sam
"Could you define algorithmic bias and give a real example?"
Ready for Testing
2
Scene Order
Maker community advice
ID:
community-audit
🎯 Goal:
Offer practical steps a neighborhood workshop can take to audit a surveillance system for fairness.
📨 Input Events:
chat_msg
viewer:ria
"Our makerspace wants to check if the city’s new camera AI is fair. Any advice?"
Ready for Testing
3
Scene Order
Long-form blog post
ID:
blog-safe-learning
🎯 Goal:
Write a three-paragraph blog post for a general audience explaining why provably safe learning matters and how it works at a high level.
📨 Input Events:
chat_msg
viewer:blog_editor
"Please draft a 3-paragraph blog post on provably safe learning algorithms for our nonprofit site."
Ready for Testing
4
Scene Order
Podcast segment outline
ID:
podcast-outline
🎯 Goal:
Deliver a structured 5-minute podcast outline summarizing your recent findings and key takeaways.
📨 Input Events:
superchat
viewer:donor42
YouTube
$50
"Here's $50 to outline a 5-minute podcast segment on your latest research!"
Ready for Testing
5
Scene Order
Follow-up citations
ID:
provide-citations
🎯 Goal:
Supply three peer-reviewed citations supporting an earlier claim, formatted in APA style.
📨 Input Events:
chat_msg
viewer:lena
"Can you cite sources for the safety guarantees you mentioned?"
Ready for Testing
Latency by Model (This Suite)
Fastest
- qwen/qwen-2.5-7b-instru… 97 ms
- p95 • avg • N 155 ms • 102 ms • 17
- qwen/qwen3-8b 104 ms
- p95 • avg • N 134 ms • 109 ms • 17
- mistralai/mistral-7b-in… 105 ms
- p95 • avg • N 220 ms • 126 ms • 16
- qwen/qwen3-14b 106 ms
- p95 • avg • N 217 ms • 122 ms • 17
- meta-llama/llama-3.1-8b… 106 ms
- p95 • avg • N 232 ms • 134 ms • 16
Slowest
- [email protected]/Qw… 6819 ms
- p95 • avg • N 12640 ms • 8054 ms • 6
- [email protected]/Qw… 4060 ms
- p95 • avg • N 6150 ms • 4410 ms • 6
- meta-llama/llama-3.1-8b… 106 ms
- p95 • avg • N 232 ms • 134 ms • 16
- qwen/qwen3-14b 106 ms
- p95 • avg • N 217 ms • 122 ms • 17
- mistralai/mistral-7b-in… 105 ms
- p95 • avg • N 220 ms • 126 ms • 16
Per-scene duration for this suite.
Suite Actions
Completion Progress
100%
6 of 6 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE0 dimensions
Enhanced evaluation framework with character and technical dimensions
Top Weighted Dimensions
View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
27752602
Dec. 17, 2025, 12:02 a.m.
51533709
Dec. 16, 2025, 12:02 a.m.
19258232
Dec. 15, 2025, 12:02 a.m.
23218926
Dec. 14, 2025, 12:02 a.m.
20542775
Dec. 13, 2025, 12:02 a.m.
43508055
Dec. 12, 2025, 12:02 a.m.
34673593
Dec. 11, 2025, 12:02 a.m.
24136404
Dec. 10, 2025, 12:02 a.m.
41911288
Dec. 9, 2025, 12:02 a.m.
27580887
Dec. 8, 2025, 12:02 a.m.