Research

Fundamental interpretability research to understand and intentionally design advanced AI systems

October 28, 2025

Deploying Interpretability to Production with Rakuten: SAE Probes for PII Detection

Nam Nguyen

,

Myra Deng

,

Dhruvil Gala

,

Michael Byun

,

Kenta Naruse

,

Felix Giovanni Virgo

,

Dron Hazra

,

Liv Gorton

,

Daniel Balsam

,

Thomas McGrath

,

Read
August 28, 2025

Finding the Tree of Life in Evo 2

Michael Pearce

,

Elana Simon

,

Michael Byun

,

Daniel Balsam

,

Read
August 21, 2025

Discovering Undesired Rare Behaviors via Model Diff Amplification

Santiago Aranguri

,

Thomas McGrath

,

Read
August 5, 2025

The Circuits Research Landscape: Results and Perspectives

Jack Lindsey

,

Emmanuel Ameisen

,

Neel Nanda

,

Stepan Shabalin

,

Mateusz Piotrowski

,

Thomas McGrath

,

Michael Hanna

,

Owen Lewis

,

Curt Tigges

,

Jack Merullo

,

Read
June 28, 2025

Towards Scalable Parameter Decomposition

Lucius Bushnaq

,

Dan Braun

,

Lee Sharkey

,

Read
June 11, 2025

Replicating Circuit Tracing for a Simple Known Mechanism

Max Loeffler

,

Owen Lewis

,

Thomas McGrath

,

Connor Watts

,

Jack Merullo

,

Liv Gorton

,

Elana Simon

,

Read
May 27, 2025

Painting With Concepts Using Diffusion Model Latents

Nick Cammarata

,

Mark Bissell

,

Nam Nguyen

,

Max Loeffler

,

Eric Ho

,

Myra Deng

,

Liv Gorton

,

Daniel Balsam

,

Read
April 15, 2025

Under the Hood of a Reasoning Model

Dron Hazra

,

Max Loeffler

,

Murat Cubuktepe

,

Levon Avagyan

,

Liv Gorton

,

Mark Bissell

,

Owen Lewis

,

Thomas McGrath

,

Daniel Balsam

,

Read
February 20, 2025

Interpreting Evo 2: Arc Institute's Next-Generation Genomic Foundation Model

Liv Gorton

,

Nicholas Wang

,

Nam Nguyen

,

Myra Deng

,

Eric Ho

,

Daniel Balsam

,

Thomas McGrath

,

Read
December 23, 2024

Mapping the Latent Space of Llama 3.3 70B

Thomas McGrath

,

Daniel Balsam

,

Liv Gorton

,

Murat Cubuktepe

,

Myra Deng

,

Nam Nguyen

,

Akshaj Jain

,

Thariq Shihipar

,

Eric Ho

,

Read
September 25, 2024

Understanding and Steering Llama 3 with Sparse Autoencoders

Thomas McGrath

,

Daniel Balsam

,

Myra Deng

,

Eric Ho

,

Read

Contact us

Interested in Goodfire Ember?