biological aidata licensingfunding roundworld modelsJuly 1, 2026

EvolutionaryScale Secures $142M for Biological "World Models"

Nvidia and Amazon back a $142M seed round to train the ESM3 model on a 278-million protein dataset.

EvolutionaryScale has closed a disclosed $142 million seed funding round (https://www.forbes.com/sites/alexkonrad/2024/06/25/evolutionaryscale-raises-142-million-for-biological-ai-model/) to accelerate the development of AI "world models" capable of simulating and engineering biological systems. The round, led by Lux Capital, Nat Friedman, and Daniel Gross, with participation from Nvidia and Amazon, positions the startup at the vanguard of the "data-for-biology" arms race. The capital is earmarked for the refinement of ESM3, a frontier language model for biology that boasts 98 billion parameters (https://www.evolutionaryscale.ai/blog/esm3-release) and was trained on a massive dataset of 278 million proteins (https://www.evolutionaryscale.ai/blog/esm3-release).

The Rise of Biological World Models

Unlike traditional generative AI that focuses on text or pixels, EvolutionaryScale is building what researchers call a "world model" for the life sciences. By treating the genetic code as a language, the ESM3 model has demonstrated the ability to generate entirely new fluorescent proteins that deviate significantly from those found in nature—effectively simulating 500 million years of evolution (https://www.evolutionaryscale.ai/blog/esm3-release) in a digital environment. This capability signals a shift in the data asset market, where the most valuable datasets are no longer just web-scraped text but highly specialized, structured biological sequences that can be used to "program" matter.

The Data Licensing Pivot: From Fair Use to Paid Assets

The EvolutionaryScale round coincides with a broader market shift toward high-integrity data licensing. While biological data is being tokenized for drug discovery, media giants are securing their own archives. OpenAI recently signed a multi-year content licensing deal (https://time.com/6992661/time-openai-partnership/) with Time Magazine, granting the AI lab access to over 100 years of journalistic archives. While the financial terms remain undisclosed (https://www.reuters.com/technology/openai-time-strike-multi-year-content-licensing-deal-2024-06-27/), the deal follows the disclosed $250 million benchmark (https://www.reuters.com/technology/news-corp-signs-multi-year-ai-content-deal-with-openai-2024-05-22/) set by the News Corp agreement. Similarly, YouTube is reportedly negotiating (https://www.ft.com/content/22759e6f-479e-41a4-9e7b-f947702f23b2) with major record labels including Sony and Universal to offer estimated multi-million dollar lump sums (https://www.ft.com/content/22759e6f-479e-41a4-9e7b-f947702f23b2) for legal access to music catalogs for AI training.

Regulatory Headwinds and Data Provenance

As the value of training data skyrockets, regulators and creators are pushing back against unauthorized usage. Figma recently disabled its "Make Design" AI feature (https://www.theverge.com/2024/7/1/24189917/figma-disables-ai-design-tool-apple-weather-app-copying) following allegations that it was trained on existing app designs, highlighting the legal risks of opaque data pipelines. Furthermore, SoftBank's reported $10 million to $20 million investment (https://www.bloomberg.com/news/articles/2024-06-27/softbank-to-invest-in-search-startup-perplexity-ai/) in Perplexity AI comes amid a flurry of copyright infringement notices from publishers, suggesting that even high-growth AI startups must now budget heavily for data compliance and settlement. This trend is further evidenced by Harvey's estimated $100 million funding round (https://techcrunch.com/2024/06/26/legal-ai-startup-harvey-is-raising-100m-at-a-1-5b-valuation/), which values the legal-data specialist at estimated $1.5 billion (https://techcrunch.com/2024/06/26/legal-ai-startup-harvey-is-raising-100m-at-a-1-5b-valuation/)—a premium driven by its access to proprietary, high-stakes legal datasets.

Why it matters for data owners

The EvolutionaryScale deal proves that the most lucrative frontier for data monetization is shifting from general web content to "domain-specific world models." For data owners in biology, law, and music, the market has moved beyond simple licensing to a strategic partnership model where the data is the primary catalyst for scientific and creative breakthroughs. As AI labs like OpenAI and Anthropic exhaust public web data, the premium on clean, proprietary, and legally-cleared datasets will continue to rise, transforming passive archives into high-yield financial assets.

d-nvest turns the data assets behind these deals into scored, actionable opportunities.

Explore the pipeline →