TY - JOUR
T1 - Adaptive prediction method based on alternating decision forests with considerations for generalization ability
AU - Misawa, Shotaro
AU - Mikawa, Kenta
AU - Goto, Masayuki
N1 - Funding Information:
The authors would like to thank Mr. Gendo Kumoi, Dr. Haruka Yamashita, and all members of the Goto Laboratory for their support of our research. A portion of this study was supported by JSPS KAKENHI Grant Numbers 26282090 and 26560167.
Publisher Copyright:
© 2017 KIIE
PY - 2017/9
Y1 - 2017/9
N2 - Many machine learning algorithms have been proposed and applied to a wide range of prediction problems in the field of industrial management. Lately, the amount of data is increasing and machine learning algorithms with low computational costs and efficient ensemble methods are needed. Alternating Decision Forest (ADF) is an efficient ensemble method known for its high performance and low computational costs. ADFs introduce weights representing the degree of prediction accuracy for each piece of training data and randomly select attribute variables for each node. This method can effectively construct an ensemble model that can predict training data accurately while allowing each decision tree to retain different features. However, outliers can cause overfitting, and since candidates of branch conditions vary for nodes in ADFs, there is a possibility that prediction accuracy will deteriorate because the fitness of training data is highly restrained. In order to improve prediction accuracy, we focus on the prediction results for new data. That is to say, we introduce bootstrap sampling so that the algorithm can generate out-of-bag (OOB) datasets for each tree in the training phase. Additionally, we construct an effective ensemble of decision trees to improve generalization ability by considering the prediction accuracy for OOB data. To verify the effectiveness of the proposed method, we conduct simulation experiments using the UCI machine learning repository. This method provides robust and accurate predictions for datasets with many attribute variables.
AB - Many machine learning algorithms have been proposed and applied to a wide range of prediction problems in the field of industrial management. Lately, the amount of data is increasing and machine learning algorithms with low computational costs and efficient ensemble methods are needed. Alternating Decision Forest (ADF) is an efficient ensemble method known for its high performance and low computational costs. ADFs introduce weights representing the degree of prediction accuracy for each piece of training data and randomly select attribute variables for each node. This method can effectively construct an ensemble model that can predict training data accurately while allowing each decision tree to retain different features. However, outliers can cause overfitting, and since candidates of branch conditions vary for nodes in ADFs, there is a possibility that prediction accuracy will deteriorate because the fitness of training data is highly restrained. In order to improve prediction accuracy, we focus on the prediction results for new data. That is to say, we introduce bootstrap sampling so that the algorithm can generate out-of-bag (OOB) datasets for each tree in the training phase. Additionally, we construct an effective ensemble of decision trees to improve generalization ability by considering the prediction accuracy for OOB data. To verify the effectiveness of the proposed method, we conduct simulation experiments using the UCI machine learning repository. This method provides robust and accurate predictions for datasets with many attribute variables.
KW - Alternating Decision Forests
KW - Big Data
KW - Data Mining
KW - Prediction Model
KW - Random Forests
UR - http://www.scopus.com/inward/record.url?scp=85033549986&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85033549986&partnerID=8YFLogxK
U2 - 10.7232/iems.2017.16.3.384
DO - 10.7232/iems.2017.16.3.384
M3 - Article
AN - SCOPUS:85033549986
SN - 1598-7248
VL - 16
SP - 384
EP - 391
JO - Industrial Engineering and Management Systems
JF - Industrial Engineering and Management Systems
IS - 3
ER -