Dataset opportunity
Virta — Knowledge Base Dataset Opportunity
Large knowledge base dataset held by Virta, usable for Document Intelligence and RAG.
Score
79.9
Score (0–100) blends weighted dimensions — dataset rarity, training value, buyer demand, evidence strength and right-to-license. 70+ is deal-ready. See the scored dimensions below for the breakdown.Confidence
92%
Action
Data Sharing Agreement
The recommended deal structure for this dataset: Acquire (full buyout), License (paid usage rights), Data Sharing Agreement (controlled access, no transfer of ownership), Partnership (co-development) or Annotation Program (labeling). Chosen from data ownership, licensing complexity and accessibility.Market
Global Intelligent Document Processing market size was valued at USD 2.3 billion in 2024 and is projected to grow at a CAGR of 24.7% between 2025 and 2034. [2]
Recent dated external facts that triggered this opportunity — auditable provenance.
- 📰press2026-06-15
Avec Thales, Renault Group renforce sa présence sur le marché de la défense
journalauto.com ↗ - 📰press2026-06-12
Les équipementiers automobiles appellent à un renforcement de l’Industrial Accelerator Act
journalauto.com ↗ - 📰press2026-06-12
Chery France muscle sa direction pour soutenir son développement commercial
journalauto.com ↗ - 📰press2026-06-12
La Belgique approuve à son tour le système de conduite autonome de Tesla
journalauto.com ↗ - 📰press2026-06-12
Cédric Lacour et Gaël de Beauchesne, premières recrues de GAC Motor France
journalauto.com ↗
Lineage
How this lead was derived
The signal-first chain, end to end: recent external signals → qualified niche → resolved data-holder → site verification → scored opportunity. Every lead is explainable.
Concrete evidence this company actively cares about data — why it's ripe for the deal room.
- 🔌Public API
Public Virta API for charging network management and data integration
source ↗
Profile
Dataset profile
Type
Knowledge Base Dataset
Modality
Text
Sector
mobility
Volume
Large
Freshness
Real-time
Rarity
High (proprietary)
Accessibility
Partial
Legal
Mixed ownership — GDPR-sensitive (PII review)
Buyer persona
Document-AI / IDP vendors
Virta holds a comprehensive Knowledge Base dataset in Text modality, derived from its extensive electric vehicle charging platform operations. This includes technical documentation, API guides, support articles, and operational procedures, making it a prime asset for training a Document Intelligence AI. Such an AI could automate customer support, enhance developer onboarding, and extract insights to streamline platform management.
The global Intelligent Document Processing market, a proxy for this use case, was valued at $2.3 billion in 2024 and is projected to grow at a 24.7% CAGR between 2025 and 2034. [2] Despite access complexities such as shared data ownership with Charge Point Operators and high GDPR sensitivity due to driver data, the dataset's value is immense. Its unique specificity to the EV charging domain provides a rare opportunity to build a highly specialized and valuable AI model, justifying the effort to navigate the necessary anonymization and consent frameworks. ⚠ Diligence (valuable data, access to negotiate): Data ownership is shared with Charge Point Operators (CPOs) using the platform.; High GDPR sensitivity due to EV driver location and charging habits.; Requires complex anonymization of individual charging sessions and payment records.; Northe subsidiary collects direct vehicle telemetry via OBDII which may have different consent terms. · corporate: independent.
Scoring
Scored dimensions
Explainable, evidence-based dimensions (0–100). The radar shows the investment axes.
This evidence collectively proves Virta owns a comprehensive, proprietary knowledge base covering the complex electric vehicle (EV) charging ecosystem. This dataset includes technical API documentation, product guides, changelogs, and support articles. For Document AI and Intelligent Document Processing (IDP) vendors, this is a rare source of domain-specific text essential for training models to understand the mobility sector's unique document formats. In a market projected to grow at over 24% annually, this dataset offers a significant competitive advantage for building next-generation document intelligence solutions.
See dimension details ↓- Dataset Freshness82
real-time/streaming
How current the data stays — real-time/streaming scores highest, periodic dumps lower. - Training Value84
fit for Document Intelligence
How useful the data is for the target AI use-case — its fit for model training or fine-tuning. - Dataset Specificity100
dominant 'knowledge_base', sector mobility, 4 specific types
How sharply the data targets a specific, hard-to-substitute domain or task. Niche, well-defined data scores higher than generic. - Dataset Rarity70
proprietary domain data (open lowers rarity)
How scarce and proprietary the data is. Unique domain data scores high; openly available data lowers it. - Dataset Volume100
24 evidence hits, explicit data-volume mention
Apparent scale of the data, inferred from the number of evidence hits and any explicit volume mentions. - Buyer Demand85
The demand is driven by two converging, high-growth markets: the AI in Mobility market, projected to grow at a 44.6% CAGR (2026-2035), and the Intelligent Document Processing (IDP) market, a proxy for Document Intelligence, which is growing
How strongly AI builders and companies are likely to want this data, based on market signals. - Legal Accessibility60
open/API access
How legally easy the data is to obtain and use — open/API access scores high; PII or regulated data scores low. - Acquisition Feasibility68
high difficulty, independent
How realistic it is to actually obtain the data, given access difficulty and the holder's corporate structure. - Evidence Strength100
9 evidence types, 24 hits
How solid the proof is that the company holds this data — diversity of evidence types and number of hits. - Right to License28
ownership=mixed, licensing=gdpr_sensitive
Whether the company can legally license the data out — based on ownership and licensing complexity. - Corporate Independence90
independent
Whether the holder can decide alone — an independent company scores higher than a subsidiary of a large group. - Data Orientation39
1 data-appetite signals (1 types)
How actively the company invests in data, measured by its data-appetite signals (hires, products, APIs…). - Dormant Data Surplus92
surplus=high, 5 recent external signals — proprietary data beyond what's already monetised
Volume and value of proprietary data this company holds BEYOND what it already monetises — the dormant surplus we can unlock. A company can sell some insights AND still sit on a far larger dormant asset. - ICP Audit75
⚠ review — The company's core business is selling an EV charging management platform (SaaS) and derived intelligence/analytics via APIs, which is a form of selling intelligence, making it a bad fit. Issues: The company's core product is a Charge Point Management System (CPMS) called Virta Hub, which is a software platform for businesses to operate EV charging netwo; Virta explicitly offers 'Data access & analytics' and a suite of APIs for customers to integrate Virta's data and functionalities i
Evidence
Dataset evidence & lineage
What the typed evidence proves the company holds — reframed for clarity and set against the market.
Downloads / exports
This indicates a collection of structured product communications and support materials, such as release notes, which are ideal for training models on product updates and customer-facing documents.
Event streams
This points to documentation describing real-time data protocols like OCPP, which is essential for training AI to understand technical specifications for IoT and mobility data streams.
Industrial data
This shows documentation exists for complex industrial use cases, including enterprise system integration (ERP, CRM) and energy management, a high-value niche for specialized document AI.
API access
This proves the existence of structured documentation detailing core platform capabilities, valuable for training models to parse API specifications and technical feature lists.
Knowledge base / docs
This is direct evidence of a centralized repository of technical knowledge, including guides and changelogs, representing a goldmine for training language models on complex support articles.
Developer portal
This confirms a formal, well-structured portal with extensive API documentation, providing high-value, real-world content for training models to understand technical developer guides.
Data-volume signal
This sample describes data access policies and analytics integration, providing text useful for training models to understand data governance and usage instructions within user guides.
IoT / sensor data
This is evidence of documentation explaining the company's IoT data infrastructure, crucial for training models to understand the context of connected device data in technical manuals.
Geospatial data
This indicates the presence of documentation related to geospatial analytics, a specialized domain for document intelligence models focused on location-based services and logistics.
Coverage
Scanned sources
Deliverable
Premium dataset report
Virta Knowledge Base — a Large knowledge base dataset (Text modality) in the mobility domain. Primary AI use-case: Document Intelligence. Market signal: Global Intelligent Document Processing market size was valued at USD 2.3 billion in 2024 and is projected to grow at a CAGR of 24.7% between 2025 and 2034. [2]. Investment score 79.9/100 (confidence 0.92). Recommended action: Data Sharing Agreement.