Evo-2, an open source AI model developed with the Arc Institute and powered by NVIDIA DGX Cloud, is revolutionising biotech by decoding DNA, RNA, and proteins across all life forms; unlocking endless possibilities for scientists worldwide.
For decades, biology has been a field of observation, where scientists have studied the intricate complexities of DNA, proteins, and cellular structures. But with AI’s rapid advancement, biology is shifting from a descriptive science to a design-driven discipline. Enter Evo-2, a groundbreaking AI model that doesn’t just analyse biology, it generates it.
Evo-2, a powerful new foundation model built using NVIDIA DGX Cloud on AWS in a collaboration led by nonprofit biomedical research organisation Arc Institute and Stanford University. Evo-2 provides insights into DNA, RNA, and proteins across diverse species. Trained on an enormous dataset of nearly 9 TRILLION nucleotides (the building blocks of DNA and RNA) the model can be applied to biomolecular research applications, including predicting the form and function of proteins based on their genetic sequence, identifying novel molecules for healthcare and industrial applications, and evaluating how gene mutations affect their function.
NVIDIA accelerated the Evo-2 project by giving scientists access to 2,000 NVIDIA H100 GPUs via NVIDIA DGX Cloud on AWS. DGX Cloud provides short-term access to large compute clusters, giving researchers the flexibility to innovate. The fully managed AI platform includes NVIDIA BioNeMo, which features optimised software in the form of NVIDIA NIM microservices and NVIDIA BioNeMo Blueprints.
Think of Evo-2 as a DNA-focused large language model (LLM). Instead of text, it generates genomic sequences. It can read and interpret complex DNA, including noncoding regions traditionally dismissed as “junk,” generate entire chromosomes, and even predict disease-causing mutations, including those previously unknown to science. This is biology hacking on an entirely new level.
The implications of this are staggering. AI is no longer just describing biology; it is now designing it. Evo-2 opens the door to synthetic life engineered from scratch, programmable genomes optimised by AI, and entirely new approaches to gene therapy. With the ability to process up to 1 million base pairs in a single context window, it can identify evolutionary patterns that humans have never seen before.
Evo-2 also predicts whether mutations are harmful or benign without specific training on human disease data. In fact, it outperforms specialised models on BRCA1 variants, demonstrating that it has learned DNA’s fundamental principles. It even generates DNA sequences that influence chromatin accessibility, effectively controlling gene expression. As a proof of concept, researchers have embedded simple Morse code into epigenomic designs, hinting at the possibility of programmable gene circuits in the future.
Perhaps the most remarkable aspect of Evo-2 is that it is fully open source. Every aspect (the model parameters, training data, and code) is freely available. This radically lowers the barriers to innovation in bioengineering, allowing researchers, startups, and institutions to build upon this breakthrough without restriction.
We are at a turning point in history. Three years ago, AI was focused on chatbots. Now, it generates genomes. Soon, it will design entire biological systems. Humanity is no longer just studying life; we are rewriting its code.
Based on the breakthrough capabilities demonstrated in Arc Institute's academic paper (found below), here are some of the most fascinating potential future applications of Evo-2, brought to you by Claude 3.5 Sonnet:
The technology's ability to generate complete mitochondrial genomes and predict genomic features across species suggests it could potentially help reconstruct extinct species' genomes. Imagine using partial DNA fragments from fossils to generate complete viable genomes of extinct creatures like the woolly mammoth or dodo bird, but with enhanced traits for modern survival.
Given Evo-2's deep understanding of DNA structure and function, we could potentially design living organisms with engineered genomes that act as biological hard drives. These could be bacteria with artificially expanded genetic codes that store and replicate digital data in their DNA, while remaining fully functional organisms.
The technology could enable design of entire interconnected networks of novel organisms - from bacteria to plants - that work together to perform complex functions like terraform harsh environments. Each organism's genome would be precisely engineered to fill a specific ecological niche while maintaining system stability.
Imagine therapeutic organisms designed to evolve in real-time within the body, with genomes pre-programmed to adapt to individual patient conditions while maintaining safety constraints. The Evo-2 model could predict and guide their evolutionary trajectories.
Using the technology's understanding of protein interactions and cellular signalling, we could potentially design living neural networks made of engineered cells that process information through precisely designed genetic circuits, creating organic computing systems.
The model's ability to work across all domains of life suggests the possibility of "translating" beneficial traits between vastly different species - like giving plants certain mammalian cellular repair mechanisms while ensuring compatibility with their existing systems.
Design of crops with dynamically responsive genomes that can rapidly adapt to changing climate conditions within safe parameters, using the model's predictive capabilities to ensure both adaptability and stability.
Entire facilities made of engineered organisms working together - some producing raw materials, others processing them, and others handling waste recycling, all orchestrated through precisely designed genetic programs.
Engineering extremophile organisms specifically designed to survive and terraform other planets, with genomes optimised for alien environments while maintaining essential Earth-life compatibility.
Creating intermediate species that bridge major evolutionary gaps, helping us understand key evolutionary transitions by designing organisms with carefully selected traits from different branches of the tree of life.
For more details on this stunning breakthrough, read the Arc Institute's full academic paper below: