We’re thrilled to announce our collaboration with
Arc Institute, a nonprofit research organization pioneering
long-context biological foundation models (the “Evo” series).
Through our partnership, we’ve developed methods to understand
their model with unprecedented precision, enabling
the extraction of meaningful units of model computation
(i.e., features
Today, Arc has announced their next-generation biological foundation model, Evo 2,
featuring 7B and 40B parameter architectures capable of processing sequences
up to 1M base pairs at nucleotide level resolution. Trained across
all domains of life, it enables both prediction and generation across
various biological complexity levels. Through our collaboration, Goodfire
and Arc have made exciting progress in applying interpretability techniques
to Evo 2, discovering numerous biologically relevant features in the
models, ranging from semantic elements like exon-intron boundaries
Biological foundation models represent a unique challenge and opportunity for AI interpretability. Unlike language models that process human-readable text, these neural networks operate on DNA sequences—a biological code that even human experts struggle to directly read and understand. Evo 2 works with an especially complex version of this challenge, processing multiple layers of biological information: from raw DNA sequences to the proteins they encode, and the intricate RNA structures they form. By applying state-of-the-art interpretability techniques (similar to those detailed in our Understanding and Steering Llama 3 paper), we hope to:
This interpretability breakthrough could deepen our understanding of biological systems while enabling new approaches to genome engineering. These advances open possibilities for developing better disease treatments and improving human health.
We provide a high level overview of the work we’ve done below. The mechanistic interpretability sections (2.4, 4.4) of the preprint contain more detailed information on our findings.
In our collaboration with Arc, we trained BatchTopK sparse autoencoders (SAEs)
(why BatchTopK?) on layer 26 (why layer 26?) of Evo 2, applying techniques
we’ve developed while interpreting language models
We discovered a wide range of features corresponding to sophisticated biological
concepts. We also validated the relevance of many of these features with a
large-scale alignment analysis between canonical biological concepts and
SAE features (quantified by measuring the domain-F1 score
between features and concepts
We have some early signs of life on steering Evo 2 to precisely engineer new protein structures, but steering this model is considerably more complex than steering a language model. Further research is needed to unlock the full potential of this approach. The potential impact of steering Evo 2 is particularly significant: while language models can be prompted to achieve desired behaviors, a model that ‘only speaks nucleotide’ cannot. Learning to steer through features would unlock entirely new capabilities.
We trained both standard ReLU SAEs
However, the nonlinearity is the TopK operation over the entire batch of inputs (with batch size ):
where the operation sets all but the top elements of to zero. This allows variable capacity per-token, but retains the loss improvements, computational efficiency and ease of hyperparameter selection that are the advantages of TopK SAEs. Training loss was superior to ReLU and features appeared equally crisp. We give some examples of interesting, biologically-relevant features below.
We chose an even mix of eukaryotic and prokaryotic data in order to find both domain-specific and generalizing features. Because SAEs obtain interesting and useful features with relatively little data compared to foundation models, we were able to make use of only high-quality reference genomes from both domains.
We trained SAEs on multiple layers of the model, including both Transformer and StripedHyena layers. Layer 26 (a StripedHyena layer) had the most interesting biologically-relevant features on an initial inspection, so we focused additional effort on studying this layer.
Evo 2 7B has 32 layers, so layer 26 is relatively late compared to our natural language model SAEs. This might be due to the much lower vocabulary size: a nucleotide-level model of the genome has very few tokens in its vocabulary (primarily A, T, C, G, though some additional tokens are also used during training) compared to the very large vocabulary size of a natural language model. This might mean that once the relevant patterns have been computed, fewer layers of the model are then required to both select the relevant next token and correctly calibrate probabilities between plausible options.
To make these features more tangible, we built an interactive feature visualizer showcasing some example SAE features as they relate to known biological concepts. Here, you can see activation values of these features across a set of bacterial reference genomes and how they align with existing genomic annotations.
For example, we discovered that the model has learned to identify several key biological concepts, including:
When looking at well-annotated regions of the E. Coli genome, we found that not
only could the model recognize gene structure and RNA segments, it also
learned more abstract concepts such as protein secondary structure. In the
figure below, activation values for features associated with α-helices,
β-sheets, and tRNAs are shown for a region containing a tRNA array
and the tufB gene. On the right, these feature activations are overlaid
on AlphaFold3’s
SAE model‐derived feature activations for α‐helices, β‐sheets, and tRNAs (left) in the E. coli genomic region encompassing the thrT and tufB genes, alongside AlphaFold’s predicted EF‐Tu (tufB) protein structure (right).
You can reference Figure 4 in the preprint to learn more about the most salient features we discovered across the various scales of biological complexity.
Preprint Figure 4: Mechanistic interpretability of Evo 2 reveals DNA, RNA, protein, and organism level features
This announcement provides a high-level overview of our work with Arc Institute. We’re currently working to publish a more comprehensive study in the coming months that will detail our interpretability methodology.
We’re excited about AI interpretability’s potential to accelerate meaningful scientific discovery. If you’re working on scientific foundation models, we’d love to explore how we can help interpret them.