Resources

Nail down fundamentals and explore the community.

Foundational

Robert Miles · 2021 · YouTube

A clear, accessible introduction to core AI safety concepts and motivations.

New York Times · 2023 · Article

Industry leaders and experts sign statement warning about existential risks from AI.

Holden Karnofsky · Blog Series

In-depth explorations of AI risks and alignment challenges from the CEO of Open Philanthropy.

Ajeya Cotra & Kelsey Piper · Blog

High-level perspectives on AI safety concerns from researchers at Open Philanthropy and Vox.

Joe Carlsmith · 2023 · Essay

A rigorous analysis of existential risks from advanced AI systems seeking power.

MIT Technology Review · 2023 · Interview

The "godfather of AI" explains why he left Google and his concerns about the technology he helped create.

Anthropic · Research Collection

Collection of articles on analyzing neural network weights and interpretability.

Neel Nanda · Guide

Beginner's guide to mechanistic interpretability with concrete steps.

Neel Nanda · Python Library

Python library for doing mechanistic interpretability on GPT-2-style language models.

Anthropic · 2022 · Paper

Harmlessness from AI feedback—using principles to guide model behavior.

DeepMind · 2023 · Paper

Framework for identifying and evaluating novel AI risks before deployment.

DeepMind · 2022 · Paper

Analysis of how AI systems can retain capabilities while pursuing unintended objectives.

AI Safety · Programs

Discover upcoming AI safety events, workshops, and training programs to develop your skills.