Resources
Nail down fundamentals and explore the community.
Foundational
Intro to AI Safety
Robert Miles · 2021 · YouTube
A clear, accessible introduction to core AI safety concepts and motivations.
A.I. Poses 'Risk of Extinction'
New York Times · 2023 · Article
Industry leaders and experts sign statement warning about existential risks from AI.
Cold Takes on AI
Holden Karnofsky · Blog Series
In-depth explorations of AI risks and alignment challenges from the CEO of Open Philanthropy.
Planned Obsolescence
Ajeya Cotra & Kelsey Piper · Blog
High-level perspectives on AI safety concerns from researchers at Open Philanthropy and Vox.
Is Power-Seeking AI an Existential Risk?
Joe Carlsmith · 2023 · Essay
A rigorous analysis of existential risks from advanced AI systems seeking power.
Why Geoffrey Hinton is Scared of AI
MIT Technology Review · 2023 · Interview
The "godfather of AI" explains why he left Google and his concerns about the technology he helped create.
Technical
Transformer Circuits Thread
Anthropic · Research Collection
Collection of articles on analyzing neural network weights and interpretability.
Getting Started with Mech Interp
Neel Nanda · Guide
Beginner's guide to mechanistic interpretability with concrete steps.
TransformerLens
Neel Nanda · Python Library
Python library for doing mechanistic interpretability on GPT-2-style language models.
Constitutional AI
Anthropic · 2022 · Paper
Harmlessness from AI feedback—using principles to guide model behavior.
Evaluations for Extreme Risks
DeepMind · 2023 · Paper
Framework for identifying and evaluating novel AI risks before deployment.
Goal Misgeneralization
DeepMind · 2022 · Paper
Analysis of how AI systems can retain capabilities while pursuing unintended objectives.