Interpreting Evo 2: Arc Institute's Next-Generation Genomic Foundation Model

Interpreting Evo 2

We’re thrilled to announce our collaboration with Arc Institute, a nonprofit research organization pioneering long-context biological foundation models (the “Evo” series). Through our partnership, we’ve developed methods to understand their model with unprecedented precision, enabling the extraction of meaningful units of model computation (i.e., featuresFeatures are interpretable patterns we extract from neural network neuron activity, revealing how the model processes information. They represent meaningful concepts that emerge from complex neural interactions - like a model’s understanding of ‘α-helices’). Preliminary experiments have shown promising directions for steering these features to guide DNA sequence generation, though this work is still in its early stages.

Introducing “Interpretable” Evo 2

Today, Arc has announced their next-generation biological foundation model, Evo 2, featuring 7B and 40B parameter architectures capable of processing sequences up to 1M base pairs at nucleotide level resolution. Trained across all domains of life, it enables both prediction and generation across various biological complexity levels. Through our collaboration, Goodfire and Arc have made exciting progress in applying interpretability techniques to Evo 2, discovering numerous biologically relevant features in the models, ranging from semantic elements like exon-intron boundariesExon-intron boundaries are crucial junctions in genes where protein-coding sequences (exons) meet non-coding sequences (introns). These sites guide RNA splicing and mutations here can disrupt proper protein production, causing diseases. to higher level concepts such as protein secondary structureProtein secondary structure refers to local folding patterns (mainly α-helices and beta sheets) that form as proteins fold. These patterns are essential for determining the protein’s final shape and function..

Why This Matters

Biological foundation models represent a unique challenge and opportunity for AI interpretability. Unlike language models that process human-readable text, these neural networks operate on DNA sequences—a biological code that even human experts struggle to directly read and understand. Evo 2 works with an especially complex version of this challenge, processing multiple layers of biological information: from raw DNA sequences to the proteins they encode, and the intricate RNA structures they form. By applying state-of-the-art interpretability techniques (similar to those detailed in our Understanding and Steering Llama 3 paper), we hope to:

This interpretability breakthrough could deepen our understanding of biological systems while enabling new approaches to genome engineering. These advances open possibilities for developing better disease treatments and improving human health.

We provide a high level overview of the work we’ve done below. The mechanistic interpretability sections (2.4, 4.4) of the preprint contain more detailed information on our findings.

Interpreter model overview

Training an Evo 2 interpreter model (sparse autoencoder or SAE)

In our collaboration with Arc, we trained BatchTopK sparse autoencoders (SAEs) (why BatchTopK?) on layer 26 (why layer 26?) of Evo 2, applying techniques we’ve developed while interpreting language models. Working closely with Arc Institute scientists, we used these tools to understand how Evo 2 processes genetic information internally.

We discovered a wide range of features corresponding to sophisticated biological concepts. We also validated the relevance of many of these features with a large-scale alignment analysis between canonical biological concepts and SAE features (quantified by measuring the domain-F1 score between features and concepts).

We have some early signs of life on steering Evo 2 to precisely engineer new protein structures, but steering this model is considerably more complex than steering a language model. Further research is needed to unlock the full potential of this approach. The potential impact of steering Evo 2 is particularly significant: while language models can be prompted to achieve desired behaviors, a model that ‘only speaks nucleotide’ cannot. Learning to steer through features would unlock entirely new capabilities.

Why did we switch to BatchTopK?

We trained both standard ReLU SAEs and BatchTopK SAEs (a variant of TopK SAEs) on a later layer (layer 26) of Evo 2. Initially, we were concerned about high-frequency features in our SAEs. However, since we didn’t observe any during training, we switched to the BatchTopK variant. As with conventional SAEs, the architecture is

x^=Dec(Enc(x))Enc(x)=σ(Wex+be)=ΔfDec(x)=Wdf+bd\begin{align*} \hat{x} &= \mathrm{Dec}\left(\mathrm{Enc}(x)\right)\\ \mathrm{Enc}(x) &= \sigma(W_e x + b_e)\\ &\overset{\Delta}{=} f\\ \mathrm{Dec}(x) &= W_d f + b_d \end{align*}

However, the nonlinearity σ()\sigma(\cdot) is the TopK operation over the entire batch of inputs (with batch size BB ):

σ(f)=TopK(f,kB)\sigma(f) = \mathrm{TopK}(f, kB)

where the TopK(x,k)\mathrm{TopK}(x, k) operation sets all but the top kk elements of xx to zero. This allows variable capacity per-token, but retains the loss improvements, computational efficiency and ease of hyperparameter selection that are the advantages of TopK SAEs. Training loss was superior to ReLU and features appeared equally crisp. We give some examples of interesting, biologically-relevant features below.

We chose an even mix of eukaryotic and prokaryotic data in order to find both domain-specific and generalizing features. Because SAEs obtain interesting and useful features with relatively little data compared to foundation models, we were able to make use of only high-quality reference genomes from both domains.

Why layer 26?

We trained SAEs on multiple layers of the model, including both Transformer and StripedHyena layers. Layer 26 (a StripedHyena layer) had the most interesting biologically-relevant features on an initial inspection, so we focused additional effort on studying this layer.

Evo 2 7B has 32 layers, so layer 26 is relatively late compared to our natural language model SAEs. This might be due to the much lower vocabulary size: a nucleotide-level model of the genome has very few tokens in its vocabulary (primarily A, T, C, G, though some additional tokens are also used during training) compared to the very large vocabulary size of a natural language model. This might mean that once the relevant patterns have been computed, fewer layers of the model are then required to both select the relevant next token and correctly calibrate probabilities between plausible options.

Deciphering Evo 2’s latent space

Visualizing interpretable features from our Evo 2 SAE

To make these features more tangible, we built an interactive feature visualizer showcasing some example SAE features as they relate to known biological concepts. Here, you can see activation values of these features across a set of bacterial reference genomes and how they align with existing genomic annotations.

For example, we discovered that the model has learned to identify several key biological concepts, including:

Evo 2 learned to identify coding sequences Evo 2 learned to identify secondary structures like alpha helices and beta sheets Evo 2 learned to identify RNA molecules involved in protein synthesis Evo 2 learned to identify viral derived sequences

Annotating a bacterial genome

When looking at well-annotated regions of the E. Coli genome, we found that not only could the model recognize gene structure and RNA segments, it also learned more abstract concepts such as protein secondary structure. In the figure below, activation values for features associated with α-helices, β-sheets, and tRNAs are shown for a region containing a tRNA array and the tufB gene. On the right, these feature activations are overlaid on AlphaFold3’s structural prediction of this protein-RNA complex, demonstrating Evo 2’s deep understanding of how DNA sequence affects downstream RNA and protein products and how we can decompose this understanding into clear, interpretable components.

Verification of Evo 2's features with AlphaFold3

SAE model‐derived feature activations for α‐helices, β‐sheets, and tRNAs (left) in the E. coli genomic region encompassing the thrT and tufB genes, alongside AlphaFold’s predicted EF‐Tu (tufB) protein structure (right).

Additional noteworthy examples

You can reference Figure 4 in the preprint to learn more about the most salient features we discovered across the various scales of biological complexity.

Preprint Figure 4: Mechanistic interpretability of Evo 2 reveals 
DNA, RNA, protein, and organism level features

Preprint Figure 4: Mechanistic interpretability of Evo 2 reveals DNA, RNA, protein, and organism level features

Looking Ahead

This announcement provides a high-level overview of our work with Arc Institute. We’re currently working to publish a more comprehensive study in the coming months that will detail our interpretability methodology.

We’re excited about AI interpretability’s potential to accelerate meaningful scientific discovery. If you’re working on scientific foundation models, we’d love to explore how we can help interpret them.