Research Collection School Of Computing and Information Systems

Learning to query: Focused web page harvesting for entity aspects

Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

5-2016

Abstract

As the Web hosts rich information about real-world entities, our information quests become increasingly entity centric. In this paper, we study the problem of focused harvesting of Web pages for entity aspects, to support downstream applications such as business analytics and building a vertical portal. Given that search engines are the de facto gateways to assess information on the Web, we recognize the essence of our problem as Learning to Query (L2Q) - to intelligently select queries so that we can harvest pages, via a search engine, focused on an entity aspect of interest. Thus, it is crucial to quantify the utilities of the candidate queries w.r.t. some entity aspect. In order to better estimate the utilities, we identify two opportunities and address their challenges. First, a target entity in a given domain has many peers. We leverage these peer entities to become domain aware. Second, a candidate query may “overlap” with the past queries that have already been fired. We account for these past queries to become context aware. Empirical results show that our approach significantly outperforms both algorithmic and manual baselines by 16% and 10% in F-scores, respectively.

Keywords

Harvesting, Websites, business analytics

Discipline

Databases and Information Systems

Research Areas

Data Science and Engineering

Publication

2016 IEEE 32nd International Conference on Data Engineering ICDE 2016: Helsinki; Finland, May 16-20, Proceedings

First Page

1002

Last Page

1013

ISBN

9781509020195

Identifier

10.1109/ICDE.2016.7498308

Publisher

IEEE Computer Society

City or Country

Los Alamitos, CA

Citation

FANG, Yuan; ZHENG, Vincent W.; and CHANG, Kevin Chen-Chuan. Learning to query: Focused web page harvesting for entity aspects. (2016). 2016 IEEE 32nd International Conference on Data Engineering ICDE 2016: Helsinki; Finland, May 16-20, Proceedings. 1002-1013.
Available at: https://ink.library.smu.edu.sg/sis_research/4066

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.1109/ICDE.2016.7498308

Download

Find it in your library

Included in

Databases and Information Systems Commons

COinS

Research Collection School Of Computing and Information Systems

Learning to query: Focused web page harvesting for entity aspects

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Learning to query: Focused web page harvesting for entity aspects

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links