Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

10-2016

Abstract

Searching for people information on the Web is a common practice in life. However, it is time consuming to search for such information manually. In this paper, we aim to develop an automatic people information search system, named ARISE-PIE. To build such a system, we tackle two major technical challenges: data harvesting and data integration. For data harvesting, we study how to leverage search engine to help crawl the relevant Web pages for a target entity; then we propose a novel learning to query model that can automatically select a set of "best" queries to maximize collective utility (e.g., precision or recall). For data integration, we study how to leverage flexible forms of constraints as weak supervision to achieve collective information extraction from a target entity’s Web page corpus; then we propose a novel conditional probabilistic formulation to model constraints and an efficient realization to enable the inference with constraints. We evaluate our data harvesting and data integration solutions on the real-world data sets, and show that they both achieve better performance than the state-of-the-art baselines. We also evaluate our system on a benchmark data set and with a user study, in which we both show promising results.

Keywords

Web crawling, Data extraction and integration, Data mining

Discipline

Databases and Information Systems

Research Areas

Data Science and Engineering

Publication

Workshop on Data-Driven Talent Acquisition, co-located with ACM International Conference on Information and Knowledge Management (CIKM) 2016, Indianapolis, October 24-28

First Page

Last Page

ISBN

9781450340731

Publisher

ACM

City or Country

New York

Citation

ZHENG, Vincent W.; HOANG, Tao; CHEN, Penghe; FANG, Yuan; and YANG, Xiaoyan. ARISE-PIE: A People Information Integration Engine over the Web. (2016). Workshop on Data-Driven Talent Acquisition, co-located with ACM International Conference on Information and Knowledge Management (CIKM) 2016, Indianapolis, October 24-28. 1-8.
Available at: https://ink.library.smu.edu.sg/sis_research/4058

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Download

Find it in your library

Included in

Databases and Information Systems Commons

COinS

Research Collection School Of Computing and Information Systems

ARISE-PIE: A People Information Integration Engine over the Web

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Publisher

City or Country

Citation

Creative Commons License

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

ARISE-PIE: A People Information Integration Engine over the Web

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Publisher

City or Country

Citation

Creative Commons License

Included in

Share

Search

Links

Browse

Links