Margin optimization based pruning for random forest
- 信息技术－已发表论文 
This article introduces a margin optimization based pruning algorithm which is able to reduce the ensemble size and improve the performance of a random forest. A key element of the proposed algorithm is that it directly takes into account the margin distribution of the random forest model on the training set. Four different metrics based on the margin distribution are used to evaluate the generalization ability of subensembles and the importance of individual classification trees in an ensemble. After a forest is built, the trees in the ensemble are first ranked according to the margin metrics and subensembles with decreasing sizes are then built by recursively removing the least important trees one by one. Experiments on 10 benchmark datasets demonstrate that our proposed algorithm can significantly improve the generalization performance while reducing the ensemble size at the same time. Furthermore, empirical comparison with other pruning methods indicates that the margin distribution plays an important role in evaluating the performance of a random forest, and can be directly used to select the near-optimal subensembles. (C) 2012 Elsevier B.V. All rights reserved.