AI Safety | Subhadip Mitra

Jan 31, 2026	Circuit Tracing for the Rest of Us: From Probes to Attribution Graphs and What It Means for Production Safety
Dec 20, 2025	I Trained Probes to Catch AI Models Sandbagging
Oct 11, 2025	Building Safer AI: Industry Response and the Path Forward - (Part 4/4)
Oct 07, 2025	Alignment Faking: When AI Pretends to Change - (Part 3/4)
Oct 03, 2025	Deliberative Alignment: Can We Train AI Not to Scheme? - (Part 2/4)
Sep 30, 2025	The Observer Effect in AI: When Models Know They're Being Tested - (Part 1/4)