Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
5-2010
Abstract
Obtaining high-quality and up-to-date labeled data can be difficult in many real-world machine learning applications, especially for Internet classification tasks like review spam detection, which changes at a very brisk pace. For some problems, there may exist multiple perspectives, so called views, of each data sample. For example, in text classification, the typical view contains a large number of raw content features such as term frequency, while a second view may contain a small but highly-informative number of domain specific features. We thus propose a novel two-view transductive SVM that takes advantage of both the abundant amount of unlabeled data and their multiple representations to improve the performance of classifiers. The idea is fairly simple: train a classifier on each of the two views of both labeled and unlabeled data, and impose a global constraint that each classifier assigns the same class label to each labeled and unlabeled data. We applied our two-view transductive SVM to the WebKB course dataset, and a real-life review spam classification dataset. Experimental results show that our proposed approach performs up to 5% better than a single view learning algorithm, especially when the amount of labeled data is small. The other advantage of our two-view approach is its significantly improved stability, which is especially useful for noisy real world data.
Discipline
Computer Sciences | Databases and Information Systems | Numerical Analysis and Scientific Computing
Research Areas
Data Science and Engineering
Publication
2010 SIAM International Conference on Data Mining: April 29-May 1, Columbus, OH: Proceedings
First Page
235
Last Page
244
ISBN
9780898717037
Identifier
10.1137/1.9781611972801.21
Publisher
SIAM
City or Country
Philadelphia, PA
Citation
LI, Guangxia; HOI, Steven C. H.; and CHANG, Kuiyu.
Two-view Transductive Support Vector Machines. (2010). 2010 SIAM International Conference on Data Mining: April 29-May 1, Columbus, OH: Proceedings. 235-244.
Available at: https://ink.library.smu.edu.sg/sis_research/2360
Copyright Owner and License
Publisher
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1137/1.9781611972801.21
Included in
Databases and Information Systems Commons, Numerical Analysis and Scientific Computing Commons