Publication Type

Conference Proceeding Article

Version

Postprint

Publication Date

2-2015

Abstract

With the popularity of smart phones and mobile devices, the number of mobile applications (a.k.a. "apps") has been growing rapidly. Detecting semantically similar apps from a large pool of apps is a basic and important problem, as it is beneficial for various applications, such as app recommendation, app search, etc. However, there is no systematic and comprehensive work so far that focuses on addressing this problem. In order to fill this gap, in this paper, we explore multi-modal heterogeneous data in app markets (e.g., description text, images, user reviews, etc.), and present "SimApp" -- a novel framework for detecting similar apps using machine learning. Specifically, it consists of two stages: (i) a variety of kernel functions are constructed to measure app similarity for each modality of data; and (ii) an online kernel learning algorithm is proposed to learn the optimal combination of similarity functions of multiple modalities. We conduct an extensive set of experiments on a real-world dataset crawled from Google Play to evaluate SimApp, from which the encouraging results demonstrate that SimApp is effective and promising.

Keywords

Mobile applications, similarity function, multi-modal data, multiple kernels, online kernel learning

Discipline

Databases and Information Systems

Research Areas

Data Management and Analytics

Publication

WSDM '15: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, 31 January - 6 February, Shanghai

First Page

305

Last Page

314

ISBN

9781450333177

Identifier

10.1145/2684822.2685305

Publisher

ACM

City or Country

New York

Creative Commons License

Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.

Additional URL

http://doi.org/10.1145/2684822.2685305

Share

COinS