EvolutionaryScale Secures $142M for Biological Data AI
Former Meta researchers launch ESM3, a frontier model trained on 2.7 billion protein sequences to program biology.
EvolutionaryScale has closed a disclosed $142 million seed funding round (https://www.evolutionaryscale.ai/blog/esm3-release) led by Nat Friedman, Daniel Gross, and Lux Capital to commercialize frontier AI models for biological data. The capital injection marks one of the largest seed stages in the history of biotech-focused AI, signaling an aggressive market appetite for "Physical AI"—systems capable of understanding and manipulating the building blocks of the physical world. At the core of the deal is the release of ESM3, a generative model trained on a proprietary and public dataset spanning 2.7 billion protein sequences (https://www.evolutionaryscale.ai/blog/esm3-release), which allows researchers to effectively "program" biology by simulating billions of years of evolution in a digital environment.
The Multi-Modal Edge in Biological Data Assets
Unlike previous iterations of protein language models, ESM3 is a multi-modal frontier model. It does not merely predict structure; it reasons across sequence, structure, and function simultaneously. By processing a dataset of 2.7 billion sequences and their corresponding 3D structures (https://www.evolutionaryscale.ai/blog/esm3-release), the model can generate entirely new proteins that do not exist in nature. This capability transforms biological data from a passive record of evolution into an active asset for drug discovery, carbon capture, and materials science. The company, founded by the team behind Meta’s ESM project, is positioning itself as the "OpenAI of biology," offering a version of the model to the scientific community while retaining high-capacity versions for commercial partnerships.
Physical AI and the Shift in Data Monetization
The EvolutionaryScale deal highlights a broader trend where the most valuable data assets are shifting from human-generated text to physical-world observations. While LLMs for text face diminishing returns and legal hurdles over copyright, biological data offers a vast, untapped frontier. The ESM3 model was trained using approximately 1.0 x 10^24 FLOPS of compute power (https://www.evolutionaryscale.ai/blog/esm3-release), a scale previously reserved for top-tier general-purpose models. This investment underscores the high cost—and high potential return—of training models on specialized, high-fidelity physical data. As physical AI matures, the licensing of structured biological, chemical, and robotic datasets is expected to outpace general web-crawled data in terms of per-token value.
The Competitive Landscape: Data Moats in Life Sciences
EvolutionaryScale enters a market currently dominated by DeepMind’s AlphaFold 3, but with a distinct focus on generative design rather than just structural prediction. The competitive moat in this sector is moving away from model architecture toward the scale and quality of the training corpus. By open-sourcing the weights for a 1.4 billion parameter version of ESM3, the company is attempting to set the industry standard for biological data representation. Meanwhile, other players in the ecosystem are securing their own data pipelines; for instance, Poolside is reportedly in talks to raise an estimated $500 million, according to Bloomberg, to apply similar foundation model principles to software engineering data, further illustrating the rush to dominate specific vertical data domains.
Regulation and the Legality of Data Acquisition
As these models scale, the legal framework for how data is acquired remains a critical pivot point for investors. In a significant ruling for the data industry, a U.S. court recently ruled in favor of Bright Data in its long-standing legal battle with Meta (https://brightdata.com/blog/court-rules-in-favor-of-bright-data), affirming that scraping public data does not violate the Computer Fraud and Abuse Act (CFAA) or breach contracts when the data is not behind a login. This ruling provides a vital legal shield for AI companies like EvolutionaryScale that rely on large-scale harvesting of public scientific databases to augment their proprietary training sets. However, regulatory pressure is mounting elsewhere; the European Commission recently informed Apple of its preliminary view that its App Store rules breach the Digital Markets Act (https://ec.europa.eu/commission/presscorner/detail/en/ip_24_3433), a reminder that data gatekeepers are under increasing scrutiny regarding how they control access to ecosystem data.
Infrastructure and Licensing Innovations
The infrastructure required to process these biological datasets is also evolving. Etched recently announced a disclosed $120 million Series A (https://www.etched.com/announcing-etched) to build specialized chips for transformer models, aiming to provide the compute efficiency necessary for the next generation of data-intensive physical AI. On the licensing front, Perplexity AI has launched a new "Publishers Program" (https://www.perplexity.ai/hub/blog/perplexity-publishers-program) to create a revenue-sharing model with data owners, including Time and Der Spiegel. This move represents a maturing of the data-for-AI market, moving away from unauthorized scraping toward structured, multi-year licensing agreements that provide AI companies with stable, high-quality data pipelines while compensating the original creators.
Why it matters for data owners
For data owners, the EvolutionaryScale deal proves that highly specialized, non-textual datasets—such as genomic sequences or protein structures—are now among the most valuable assets in the AI economy. As foundation models move into the physical sciences, the ability to provide clean, structured, and ethically sourced data for "Physical AI" will command premium licensing fees. Data owners should focus on auditing their proprietary datasets for their generative potential, as the market is rapidly shifting from simple data storage to the active licensing of assets for model training and fine-tuning.
d-nvest turns the data assets behind these deals into scored, actionable opportunities.
Explore the pipeline →