Research Collection School Of Computing and Information Systems

Parameter Estimation in Semi-Random Decision Tree Ensembling on Streaming Data

Peipei LI, Singapore Management University
Qianhui (Althea) LIANG, Singapore Management UniversityFollow
Xindong WU
X. Hu

Publication Type

Conference Proceeding Article

Publication Date

3-2009

Abstract

The induction error in random tree ensembling results mainly from the strength of decision trees and the dependency between base classifiers. In order to reduce the errors due to both factors, a Semi-Random Decision Tree Ensembling (SRDTE) for mining streaming data is proposed based on our previous work on SRMTDS. The model contains semi-random decision trees that are independent in the generation process and have no interaction with each other in the individual decisions of classification. The main idea is to minimize correlation among the classifiers. We claim that the strength of decision trees is closely related to the estimation values of the parameters, including the height of a tree, the count of trees and the parameter of n min in the Hoeffding Bounds. We analyze these parameters of the model and design strategies for better adaptation to streaming data. The main strategies include an incremental generation of sub-trees after seeing real training instances, a data structure for quick search and a voting mechanism for classification. Our evaluation in the 0-1 loss function shows that SRDTE has improved the performance in terms of predictive accuracy and robustness. We have applied SRDTE to e-business data streams and proved its feasibility and effectiveness.

Keywords

Random decision trees - data streams - parameter estimation

Discipline

Computer Sciences

Publication

Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining

First Page

376

Last Page

388

ISBN

9783642013065

Identifier

10.1007/978-3-642-01307-2_35

Publisher

Springer Verlag

Citation

LI, Peipei; LIANG, Qianhui (Althea); WU, Xindong; and Hu, X.. Parameter Estimation in Semi-Random Decision Tree Ensembling on Streaming Data. (2009). Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining. 376-388.
Available at: https://ink.library.smu.edu.sg/sis_research/454

Additional URL

http://dx.doi.org/10.1007/978-3-642-01307-2_35

Link to Full Text

COinS

Research Collection School Of Computing and Information Systems

Parameter Estimation in Semi-Random Decision Tree Ensembling on Streaming Data

Publication Type

Publication Date

Abstract

Keywords

Discipline

Publication

First Page

Last Page

ISBN

Identifier

Publisher

Citation

Additional URL

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Parameter Estimation in Semi-Random Decision Tree Ensembling on Streaming Data

Author

Publication Type

Publication Date

Abstract

Keywords

Discipline

Publication

First Page

Last Page

ISBN

Identifier

Publisher

Citation

Additional URL

Share

Search

Links

Browse

Links