Dataset opportunity
Cleanpower — Search & Query Logs Dataset Opportunity
Large search & query logs dataset held by Cleanpower, usable for RAG and Search Relevance.
Score
84.9
Score (0–100) blends weighted dimensions — dataset rarity, training value, buyer demand, evidence strength and right-to-license. 70+ is deal-ready. See the scored dimensions below for the breakdown.Confidence
92%
Action
Acquire
The recommended deal structure for this dataset: Acquire (full buyout), License (paid usage rights), Data Sharing Agreement (controlled access, no transfer of ownership), Partnership (co-development) or Annotation Program (labeling). Chosen from data ownership, licensing complexity and accessibility.Market
Global Retrieval Augmented Generation (RAG) market = $1.3B in 2024, CAGR 49.9% (2025-2034)
Recent dated external facts that triggered this opportunity — auditable provenance.
- 📰press2026-06-05
EDF serait sur le point de céder ses renouvelables en Amérique du Nord
greenunivers.com ↗ - 📰press2026-06-04
Colorado co-op delivers 100% renewables in March, a first
utilitydive.com ↗ - 📰press2026-06-04
Protesters target NV Energy at electric utility conference as anger over affordability rises
utilitydive.com ↗ - 📰press2026-06-04
Electric sector needs firm gas supply to protect grid reliability, gas industry report says
utilitydive.com ↗ - 📰press2026-06-04
Speed to power requires more transmission, not less competition
utilitydive.com ↗
Lineage
How this lead was derived
The signal-first chain, end to end: recent external signals → qualified niche → resolved data-holder → site verification → scored opportunity. Every lead is explainable.
Concrete evidence this company actively cares about data — why it's ripe for the deal room.
- 📦Data product
SolarAnywhere® Products: Historical Data, Real-Time Data, Forecast Data
source ↗ - 🔌Public API
Clean Power Research API for custom applications and data interaction
source ↗ - 🧑💻Hiring a data role
DJ Mann, Data Manager
source ↗ - ✨Signal
Research Team pioneering state-of-the-art analytical methods for clean energy
source ↗
Profile
Dataset profile
Type
Search & Query Logs Dataset
Modality
Text
Sector
other
Volume
Large
Freshness
Real-time
Rarity
High (proprietary)
Accessibility
Restricted
Legal
Mixed ownership — clean to license · PII/regulated
Buyer persona
LLM application teams & enterprise search vendors
Cleanpower holds a rich Search & Query Logs Dataset in Text modality, augmented by geo_data, industrial_data, iot_data, and transaction_data, making it exceptionally valuable for Retrieval Augmented Generation (RAG) applications. This diverse collection provides deep contextual understanding, enabling AI models to generate highly accurate and relevant responses by grounding them in real-world operational and user interaction data. The presence of API access, significant data_volume, and event_streams further enhances its utility for dynamic RAG systems requiring continuous updates and broad coverage.
The RAG market is experiencing rapid growth, projected to reach USD 74.5 billion by 2034 with a CAGR of 49.9% (2025-2034), while the broader AI training dataset market (where text data holds a significant share) is expected to hit USD 22.7 billion by 2034 with a CAGR of 20.6% (2026-2034). Despite complexities such as existing data products (SolarAnywhere) requiring careful negotiation, customer-owned data needing consent, and already selling derived insights, this dormant surplus data remains VALUABLE. Its rarity and depth, especially the combination of search logs with specialized industrial and geospatial context, present a unique opportunity for buyers seeking to significantly enhance their AI capabilities. ⚠ Diligence (valuable data, access to negotiate): Existing data products (SolarAnywhere) are already sold, requiring careful negotiation to avoid disintermediation.; Some data is customer-owned (e.g., utility operational data processed by PowerClerk), requiring client consent.; Already sells a derived insight/analytics product — opportunity is the dormant surplus beyond it. · corporate: independent.
Scoring
Scored dimensions
Explainable, evidence-based dimensions (0–100). The radar shows the investment axes.
Cleanpower possesses a highly proprietary dataset of search and query logs derived from its extensive energy-focused platforms, offering unparalleled insights into user intent and information needs. This rich text modality data is exceptionally valuable for LLM application teams and enterprise search vendors operating within the rapidly expanding Retrieval Augmented Generation (RAG) market, projected to reach $1.3B in 2024 with a 49.9% CAGR. For buyers, these logs are critical for fine-tuning models, enhancing retrieval accuracy, and understanding the specific information demands of a sophisticated user base in the energy sector, leveraging Cleanpower's deep domain expertise and established data infrastructure serving over 80 utilities and 200 solar industry players.
See dimension details ↓- Dataset Specificity100
dominant 'search_logs', sector other, 6 specific types
How sharply the data targets a specific, hard-to-substitute domain or task. Niche, well-defined data scores higher than generic. - Dataset Rarity100
proprietary domain data
How scarce and proprietary the data is. Unique domain data scores high; openly available data lowers it. - Dataset Volume100
24 evidence hits, explicit data-volume mention
Apparent scale of the data, inferred from the number of evidence hits and any explicit volume mentions. - Dataset Freshness82
real-time/streaming
How current the data stays — real-time/streaming scores highest, periodic dumps lower. - Training Value100
fit for RAG
How useful the data is for the target AI use-case — its fit for model training or fine-tuning. - Buyer Demand95
The Retrieval-Augmented Generation (RAG) market is projected to grow at a Compound Annual Growth Rate (CAGR) of 49.9% from 2024 to 2034, and search and query logs are explicitly identified as essential "AI Search Data" for powering RAG syst
How strongly AI builders and companies are likely to want this data, based on market signals. - Legal Accessibility28
open/API access
How legally easy the data is to obtain and use — open/API access scores high; PII or regulated data scores low. - Acquisition Feasibility0
medium difficulty, independent
How realistic it is to actually obtain the data, given access difficulty and the holder's corporate structure. - Evidence Strength100
11 evidence types, 24 hits
How solid the proof is that the company holds this data — diversity of evidence types and number of hits. - Right to License58
ownership=mixed, licensing=clean
Whether the company can legally license the data out — based on ownership and licensing complexity. - Corporate Independence90
independent
Whether the holder can decide alone — an independent company scores higher than a subsidiary of a large group. - Data Orientation90
4 data-appetite signals (4 types)
How actively the company invests in data, measured by its data-appetite signals (hires, products, APIs…). - Dormant Data Surplus92
surplus=high, 5 recent external signals — proprietary data beyond what's already monetised
Volume and value of proprietary data this company holds BEYOND what it already monetises — the dormant surplus we can unlock. A company can sell some insights AND still sit on a far larger dormant asset. - ICP Audit50
⚠ review — CleanPower is a commercial janitorial cleaning service with a real operational business and SME size, but its core activities do not generate 'Search & Query Logs Dataset' as a by-product, making it a poor fit for this specific data opportunity. Issues: The company's core business is commercial cleaning, which does not generate 'Search & Query Logs Dataset' as a by-product of its operations.; The specified 'Search & Query Logs Dataset Opportunity' is misaligned with the company's actu
Evidence
Dataset evidence & lineage
What the typed evidence proves the company holds — reframed for clarity and set against the market.
API access
This evidence confirms Cleanpower's established history of providing programmatic access to its trusted energy data and calculation tools, enabling developers to integrate and build custom applications, demonstrating a mature data infrastructure.
Developer portal
This highlights Cleanpower's significant B2B presence, serving over 80 electric utilities and 200+ solar industry leaders with specialized solutions, underscoring the high value and industry relevance of their data and platforms.
Geospatial data
This confirms Cleanpower's capability to integrate and provide global solar irradiance data and other geospatial information, essential for location-specific energy resource assessment and planning.
Search / query logs
Directly confirming the existence of the target dataset, this evidence shows Cleanpower actively records website search interactions and preferences using Site Search 360, providing direct insight into user information needs and content relevance.
Event streams
This indicates Cleanpower collects and provides dynamic real-time and historical data streams, including forecasts, which are critical for operational insights and predictive analytics in the energy sector.
Schema / data dictionary
This points to well-defined data specifications and analytical models, such as those for identifying PV, storage, and EVs from utility data, indicating structured and interpretable datasets valuable for AI consumption.
Transaction data
This evidence suggests Cleanpower possesses data related to energy transactions and adoption scenarios, offering insights into market activity and consumer behavior within the clean energy space.
IoT / sensor data
This confirms the availability of real-time satellite-derived irradiance data for PV production estimation, showcasing Cleanpower's expertise in collecting and leveraging sensor-like data for critical energy applications.
Industrial data
This highlights Cleanpower's provision of specialized DER data and insights via platforms like FleetView, crucial for grid planning and operations within the industrial energy sector.
Data-volume signal
This demonstrates the substantial scale of Cleanpower's data collection, exemplified by a virtual energy audit for nearly 350,000 residential homes, indicating comprehensive coverage and statistical robustness.
Knowledge base / docs
This reveals Cleanpower's commitment to state-of-the-art analytical methods and ongoing research, ensuring the quality, depth, and continuous improvement of their data and software services.
Coverage
Scanned sources
Deliverable
Premium dataset report
Cleanpower Search & Query Logs — a Large search & query logs dataset (Text modality) in the other domain. Primary AI use-case: RAG. Market signal: Global Retrieval Augmented Generation (RAG) market = $1.3B in 2024, CAGR 49.9% (2025-2034). Investment score 84.9/100 (confidence 0.92). Recommended action: Acquire.