Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

11-2017

Abstract

Data fusion is a fundamental research problem of identifying true values of data items of interest from conflicting multi-sourced data. Although considerable research efforts have been conducted on this topic, existing approaches generally assume every data item has exactly one true value, which fails to reflect the real world where data items with multiple true values widely exist. In this paper, we propose a novel approach,SourceVote, to estimate value veracity for multi-valued data items. SourceVote models the endorsement relations among sources by quantifying their two-sided inter-source agreements. In particular, two graphs are constructed to model inter-source relations. Then two aspects of source reliability are derived from these graphs and are used for estimating value veracity and initializing existing data fusion methods. Empirical studies on two large real-world datasets demonstrate the effectiveness of our approach.

Keywords

Data integration, Data fusion, Multi-valued data items, Inter-source agreements

Discipline

Databases and Information Systems | Data Storage Systems

Publication

Conceptual modeling: ER 2017: 36th International Conference, Valencia, Spain, November 6-9: Proceedings

Volume

10650

First Page

164

Last Page

172

ISBN

9783319699042

Identifier

10.1007/978-3-319-69904-2_13

Publisher

Springer

City or Country

Cham

Copyright Owner and License

Authors

Additional URL

https://doi.org/10.1007/978-3-319-69904-2_13

Share

COinS