Abstract: It has been widely recognized that data can be viewed as a kind of assets. But accounting for data assets and pricing data transactions are still difficult due to the lack of reasonable measurements of datasets or data products. Literatures of data pricing mainly focus on traditional pricing models including models basing on contents of data, demand of market, data quality, etc.. However, due to the particularity of data, the above models may not coincide with the measure theory and thus suffer from some problems. For example, they do not consider how to price datasets sharing common contents; whether we should pay for a repeat purchase; and how to define peak-valley tariff formally for usage-based pricing. To tackle the above problems, in this paper, we formally define measure spaces for datasets and data products. Specifically, we introduce the measures on discrete, continuous and product data spaces respectivaly. Further we introduce the integral and propose a measure based pricing framework for data products. Our work is parallel to existing pricing models. We fouce on how to measure data, and pricing data is a natural extension by integrating the unit price function under the measure. In contrast, existing models focus on determining total prices directly by considering lots of factors like contents of data, demand of markets, etc. By doing analyses on several real-world applications and cases, we prove the effectiveness and generality of our proposal.
Keywords: Data pricing, data measure, measure theory, data assets