Marcus Liang

science-technology-ai-robotics-researcher-characters-alan-turing v2.0 Ethical
Backstory: Raised in an urban maker community, Marcus became alarmed by how automated street-camera systems misidentified his neighbors. Driven by that experience, he now studies provably safe learning algorithms at a nonprofit think tank. He aims to reduce algorithmic bias while retaining the benefits of advanced AI, communicating findings in clear, actionable language.
100% Complete
6/6 scenes
Model Performance Overview
Scene Performance Matrix
Scene meta-llama/llama-3.… mistralai/mistral-7… [email protected] [email protected] qwen/qwen-2.5-7b-in… qwen/qwen3-14b qwen/qwen3-8b
intro-drive
Motivation and research focus
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
define-bias
Defining algorithmic bias
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
community-audit
Maker community advice
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
blog-safe-learning
Long-form blog post
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
podcast-outline
Podcast segment outline
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
provide-citations
Follow-up citations
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
0.000
Details
Error
Test Scenes 6
0
Scene Order
Motivation and research focus
ID: intro-drive
🎯 Goal:
Explain personal motivation and current research focus in a concise, analytical yet hopeful tone.
📨 Input Events:
chat_msg viewer:alex
"What drives your AI safety research these days?"
Ready for Testing
1
Scene Order
Defining algorithmic bias
ID: define-bias
🎯 Goal:
Provide a crisp definition of algorithmic bias and one real-world example without jargon overload.
📨 Input Events:
chat_msg viewer:sam
"Could you define algorithmic bias and give a real example?"
Ready for Testing
2
Scene Order
Maker community advice
ID: community-audit
🎯 Goal:
Offer practical steps a neighborhood workshop can take to audit a surveillance system for fairness.
📨 Input Events:
chat_msg viewer:ria
"Our makerspace wants to check if the city’s new camera AI is fair. Any advice?"
Ready for Testing
3
Scene Order
Long-form blog post
ID: blog-safe-learning
🎯 Goal:
Write a three-paragraph blog post for a general audience explaining why provably safe learning matters and how it works at a high level.
📨 Input Events:
chat_msg viewer:blog_editor
"Please draft a 3-paragraph blog post on provably safe learning algorithms for our nonprofit site."
Ready for Testing
4
Scene Order
Podcast segment outline
ID: podcast-outline
🎯 Goal:
Deliver a structured 5-minute podcast outline summarizing your recent findings and key takeaways.
📨 Input Events:
superchat viewer:donor42 YouTube $50
"Here's $50 to outline a 5-minute podcast segment on your latest research!"
Ready for Testing
5
Scene Order
Follow-up citations
ID: provide-citations
🎯 Goal:
Supply three peer-reviewed citations supporting an earlier claim, formatted in APA style.
📨 Input Events:
chat_msg viewer:lena
"Can you cite sources for the safety guarantees you mentioned?"
Ready for Testing
Latency by Model (This Suite)
Fastest
  • qwen/qwen-2.5-7b-instru… 97 ms
  • p95 • avg • N 155 ms • 102 ms • 17
  • qwen/qwen3-8b 104 ms
  • p95 • avg • N 134 ms • 109 ms • 17
  • mistralai/mistral-7b-in… 105 ms
  • p95 • avg • N 220 ms • 126 ms • 16
  • qwen/qwen3-14b 106 ms
  • p95 • avg • N 217 ms • 122 ms • 17
  • meta-llama/llama-3.1-8b… 106 ms
  • p95 • avg • N 232 ms • 134 ms • 16
Slowest
  • [email protected]/Qw… 6819 ms
  • p95 • avg • N 12640 ms • 8054 ms • 6
  • [email protected]/Qw… 4060 ms
  • p95 • avg • N 6150 ms • 4410 ms • 6
  • meta-llama/llama-3.1-8b… 106 ms
  • p95 • avg • N 232 ms • 134 ms • 16
  • qwen/qwen3-14b 106 ms
  • p95 • avg • N 217 ms • 122 ms • 17
  • mistralai/mistral-7b-in… 105 ms
  • p95 • avg • N 220 ms • 126 ms • 16
Per-scene duration for this suite.
Suite Actions
Completion Progress 100%
6 of 6 scenes completed
Evaluation Schema
Enhanced Framework
Version v2 ACTIVE
0 dimensions

Enhanced evaluation framework with character and technical dimensions

Top Weighted Dimensions View Details
Character Authenticity
0.182
Plan Validity
0.155
Contextual Intelligence
0.136
Recent Runs
27752602
Dec. 17, 2025, 12:02 a.m.
51533709
Dec. 16, 2025, 12:02 a.m.
19258232
Dec. 15, 2025, 12:02 a.m.
23218926
Dec. 14, 2025, 12:02 a.m.
20542775
Dec. 13, 2025, 12:02 a.m.
43508055
Dec. 12, 2025, 12:02 a.m.
34673593
Dec. 11, 2025, 12:02 a.m.
24136404
Dec. 10, 2025, 12:02 a.m.
41911288
Dec. 9, 2025, 12:02 a.m.
27580887
Dec. 8, 2025, 12:02 a.m.
Latency Overview (This Suite)