fundingdata licensingscale aiai infrastructureJune 22, 2026

Scale AI Secures $1B Series F at $13.8B Valuation

Accel-led round positions Scale AI as the primary 'Data Foundry' for frontier AI model development.

Scale AI has finalized a $1 billion Series F funding round (https://scale.com/blog/series-f) that elevates its post-money valuation to a disclosed $13.8 billion (https://www.bloomberg.com/news/articles/2024-05-21/scale-ai-raises-1-billion-at-13-8-billion-valuation). Led by Accel with significant participation from Nvidia, Amazon, and Meta, the round signals a massive capital pivot toward the 'data bottleneck' currently facing the largest Large Language Model (LLM) developers. As the industry moves beyond the era of 'scraping the open web,' Scale AI’s mission to build a 'Data Foundry' represents the most significant investment to date in the creation of expert-labeled, high-density data assets for sovereign and enterprise AI.

The Industrialization of AI Data

The $1 billion injection (https://scale.com/blog/series-f) is not merely a growth round; it is an infrastructure play. Scale AI is positioning itself as the essential refinery for the raw material of the 21st century. The funding will be used to scale their 'Data Engine,' which provides the Reinforcement Learning from Human Feedback (RLHF) necessary to push models past current reasoning plateaus. With investors like Cisco Investments, Intel Capital, and AMD Ventures (https://www.bloomberg.com/news/articles/2024-05-21/scale-ai-raises-1-billion-at-13-8-billion-valuation) joining the cap table, the deal underscores a cross-industry consensus: the next generation of AI performance will be won through data quality, not just compute volume.

Licensing Deals Reach Fever Pitch

The Scale AI round coincides with an unprecedented wave of direct data acquisition deals between model builders and premium content owners. Most notably, News Corp signed a multi-year deal with OpenAI (https://www.wsj.com/business/media/news-corp-openai-content-licensing-deal-80860d4d) valued at an estimated $250 million over five years (https://www.wsj.com/business/media/news-corp-openai-content-licensing-deal-80860d4d). This partnership grants OpenAI access to archives and current content from The Wall Street Journal, Barron’s, and The Times, marking a definitive shift toward licensed, high-authority datasets. Similarly, OpenAI’s partnership with Reddit (https://openai.com/index/openai-and-reddit-partnership/) provides real-time access to the Reddit Data API, allowing for the integration of human-centric conversational data into ChatGPT and other products.

Capitalizing on Embodied and Specialized Data

Beyond text-based LLMs, the market for specialized data assets is seeing massive capital inflows. Wayve recently secured $1.05 billion (https://www.reuters.com/business/autos-transportation/uk-ai-start-up-wayve-raises-105-bln-softbank-led-funding-2024-05-07/) in a Series C round led by SoftBank to develop 'Embodied AI' for autonomous driving. This deal highlights the premium placed on 'edge data'—real-world sensory information that cannot be replicated by synthetic generation alone. Supporting this data-heavy ecosystem, CoreWeave raised $1.1 billion (https://techcrunch.com/2024/05/01/coreweave-raises-1-1b-at-a-19b-valuation/) to expand its specialized cloud infrastructure, specifically designed to handle the massive throughput required for data-intensive AI training workloads.

The Regulatory and Rights Backlash

As the value of data assets skyrockets, rights holders are moving aggressively to protect their intellectual property. Sony Music Group issued a formal warning to over 700 AI companies (https://variety.com/2024/music/news/sony-music-warns-ai-companies-using-content-without-permission-1236006080/), declaring an explicit 'opt-out' from unauthorized data scraping for AI training. This follows a string of licensing agreements, such as OpenAI’s deal with Vox Media (https://www.theverge.com/2024/5/13/24155488/openai-vox-media-licensing-deal-chatgpt) and The Atlantic (https://www.theatlantic.com/press-releases/archive/2024/05/the-atlantic-and-openai-announce-strategic-content-and-product-partnership/678526/), suggesting that the era of 'fair use' for training data is rapidly being replaced by a structured, multi-billion-dollar marketplace for content rights.

Why it matters for data owners

For institutional data owners, the Scale AI valuation and the News Corp deal confirm that proprietary datasets are no longer secondary assets—they are the primary leverage in the AI economy. The transition from $250 million licensing deals to $1 billion funding rounds for data refineries indicates that 'clean, expert-labeled data' is now a distinct asset class. Owners of unique, high-velocity, or historically deep data should prioritize data governance and 'AI-readiness' to capture the premium valuations now being set by the market's biggest players.

d-nvest turns the data assets behind these deals into scored, actionable opportunities.

Explore the pipeline →
Scale AI Secures $1B Series F at $13.8B Valuation | d-nvest