Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

8-2021

Abstract

Data is one of the most critical resources in the AI Era. While substantial research has been dedicated to training machine learning models using various types of data, much less efforts have been invested in the exploration of assessing and governing data assets in end-to-end processes of machine learning and data science, that is, the pipeline where data is collected and processed, and then machine learning models are produced, requested, deployed, shared and evolved. To provide a state-of-the-art overall picture of this important and novel area and advocate the related research and development, we present a tutorial addressing two essential problems. First, in the pipeline of machine learning, how can data and machine learning models be priced properly so that contributions from various parties can be assessed and recognized in a fair manner? Second, in the collaboration among many parties in building, distributing and sharing machine learning models, how can data as assets be managed? Accordingly, the first part of our proposal surveys data and model pricing in the pipeline of machine learning, while the second part discusses data asset governance for collaborative artificial intelligence. Each part is self-contained. At the same time, the two parts echo each other and connect a series of interesting and important problems into a dynamic big picture.

Keywords

Data asset, Data pricing, Data governance, Consensus, Blockchain, Privacy, Federated learning

Discipline

Databases and Information Systems

Research Areas

Data Science and Engineering

Publication

Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, August 14-18

First Page

4058

Last Page

4059

Identifier

10.1145/3447548.3470818

City or Country

KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining

Share

COinS