biological dataai fundingdata licensingdefense airegulationJuly 1, 2026

EvolutionaryScale Secures $142M to Scale Biological Data Models

The startup closes a seed round to build generative AI using a dataset of 2.78 billion protein sequences.

EvolutionaryScale has closed a disclosed $142 million (https://www.forbes.com/sites/alexkonrad/2024/06/25/evolutionaryscale-raises-142-million-for-biology-ai/) seed funding round to accelerate the development of generative AI models trained on massive biological datasets. The round, led by Nat Friedman, Daniel Gross, and Lux Capital, positions the company to treat biology as a programmable data asset, leveraging its new ESM3 model which was trained on a dataset of 2.78 billion protein sequences (https://www.evolutionaryscale.ai/blog/esm3-release). This milestone underscores the escalating value of non-textual, domain-specific data in the race for frontier AI capabilities.

The Rise of Biological Data Assets

Unlike general-purpose LLMs that scrape the public web, EvolutionaryScale’s value proposition is built on the curation and processing of specialized biological information. The ESM3 model is a multimodal generative model that can reason over the sequence, structure, and function of proteins. By processing trillions of data points (https://www.evolutionaryscale.ai/blog/esm3-release) from the natural world, the startup aims to enable researchers to "program" new proteins, potentially shortening drug discovery timelines from years to weeks. This "ChatGPT for biology" approach highlights a broader market trend: the monetization of proprietary, high-fidelity scientific datasets that cannot be easily replicated by generic crawlers.

Licensing vs. Litigation: The Battle for Data Rights

The funding of data-heavy startups like EvolutionaryScale comes as the legal landscape for data acquisition reaches a boiling point. OpenAI and Time Magazine recently finalized a multi-year content licensing agreement (https://openai.com/index/openai-and-time-sign-multi-year-content-partnership-and-strategic-alliance/), granting OpenAI access to Time’s 101-year-old archive. While the exact financial terms were not disclosed, industry analysts point to OpenAI’s estimated $250 million deal with News Corp (https://www.reuters.com/technology/news-corp-strikes-ai-content-licensing-deal-with-openai-2024-05-22/) as a benchmark for the premium now placed on verified human journalism.

Conversely, the cost of unlicensed data acquisition is becoming prohibitively high. The RIAA, representing major labels like Sony and Universal, is seeking statutory damages of up to $150,000 per work (https://www.reuters.com/legal/music-labels-sue-suno-udio-ai-copyright-infringement-2024-06-24/) in a lawsuit against AI music startups Suno and Udio. With hundreds of thousands of recordings allegedly used without permission, the total liability could reach an estimated $13.5 billion (https://www.reuters.com/legal/music-labels-sue-suno-udio-ai-copyright-infringement-2024-06-24/). This legal pressure is forcing a transition from "fair use" defense to a structured data marketplace where every training token has a clear provenance and price tag.

Capital Inflow into Data-Intensive Infrastructure

The demand for data-ready AI has also triggered massive infrastructure investments. Helsing, a European defense AI firm, has secured a disclosed €450 million (https://www.reuters.com/technology/defense-ai-startup-helsing-raises-450-mln-euro-funding-round-2024-07-04/) in a Series C round, valuing the company at an estimated €5 billion (https://www.bloomberg.com/news/articles/2024-06-17/defense-ai-startup-helsing-is-said-to-near-400-million-funding). Helsing’s software-defined defense systems rely on real-time processing of battlefield sensor data, representing a critical vertical for data asset monetization in the public sector. Similarly, Etched.ai raised a disclosed $120 million (https://techcrunch.com/2024/06/25/etched-raises-120m-to-build-a-chip-that-only-runs-transformer-models/) to build specialized chips designed specifically to handle the massive data throughput required by Transformer models.

In the legal tech space, the startup Harvey is reportedly in talks to raise new capital at an estimated $2 billion valuation (https://techcrunch.com/2024/06/25/legal-ai-startup-harvey-is-raising-600m-from-google-at-a-2b-valuation/). Harvey’s core asset is its access to and processing of proprietary legal data, further proving that the market is rewarding companies that control the "data moat" rather than just the algorithm.

Why it matters for data owners

For institutional data owners, the EvolutionaryScale and OpenAI-Time deals confirm that the era of free data scraping is ending. Data is no longer a byproduct of business operations; it is a primary asset class. Whether it is biological sequences, historical archives, or legal precedents, the market is now providing two distinct paths: multi-million dollar licensing partnerships for those who cooperate, and multi-billion dollar litigation for those whose assets are taken without consent. As AI models become more specialized, the value of niche, high-integrity datasets will continue to outpace the value of generic web-scraped content.

d-nvest turns the data assets behind these deals into scored, actionable opportunities.

Explore the pipeline →