TY - JOUR

T1 - Meta-tree random forest

T2 - Probabilistic data-generative model and bayes optimal prediction

AU - Dobashi, Nao

AU - Saito, Shota

AU - Nakahara, Yuta

AU - Matsushima, Toshiyasu

N1 - Funding Information:
Funding: This work was supported in part by JSPS KAKENHI Grant Numbers JP17K06446, JP19K04914 and JP19K14989.
Publisher Copyright:
© 2021 by the authors. Licensee MDPI, Basel, Switzerland.

PY - 2021/6

Y1 - 2021/6

N2 - This paper deals with a prediction problem of a new targeting variable corresponding to a new explanatory variable given a training dataset. To predict the targeting variable, we consider a model tree, which is used to represent a conditional probabilistic structure of a targeting variable given an explanatory variable, and discuss statistical optimality for prediction based on the Bayes decision theory. The optimal prediction based on the Bayes decision theory is given by weighting all the model trees in the model tree candidate set, where the model tree candidate set is a set of model trees in which the true model tree is assumed to be included. Because the number of all the model trees in the model tree candidate set increases exponentially according to the maximum depth of model trees, the computational complexity of weighting them increases exponentially according to the maximum depth of model trees. To solve this issue, we introduce a notion of meta-tree and propose an algorithm called MTRF (Meta-Tree Random Forest) by using multiple meta-trees. Theoretical and experimental analyses of the MTRF show the superiority of the MTRF to previous decision tree-based algorithms.

AB - This paper deals with a prediction problem of a new targeting variable corresponding to a new explanatory variable given a training dataset. To predict the targeting variable, we consider a model tree, which is used to represent a conditional probabilistic structure of a targeting variable given an explanatory variable, and discuss statistical optimality for prediction based on the Bayes decision theory. The optimal prediction based on the Bayes decision theory is given by weighting all the model trees in the model tree candidate set, where the model tree candidate set is a set of model trees in which the true model tree is assumed to be included. Because the number of all the model trees in the model tree candidate set increases exponentially according to the maximum depth of model trees, the computational complexity of weighting them increases exponentially according to the maximum depth of model trees. To solve this issue, we introduce a notion of meta-tree and propose an algorithm called MTRF (Meta-Tree Random Forest) by using multiple meta-trees. Theoretical and experimental analyses of the MTRF show the superiority of the MTRF to previous decision tree-based algorithms.

KW - Bayes decision theory

KW - Data-generative model

KW - Meta-tree

KW - Prediction

KW - Random forest

UR - http://www.scopus.com/inward/record.url?scp=85108879389&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85108879389&partnerID=8YFLogxK

U2 - 10.3390/e23060768

DO - 10.3390/e23060768

M3 - Article

AN - SCOPUS:85108879389

SN - 1099-4300

VL - 23

JO - Entropy

JF - Entropy

IS - 6

M1 - 768

ER -