TY - GEN
T1 - A Hypothesis Discovery Method for Predicting Change in Multidimensional Time-series Data
AU - Kumoi, Gendo
AU - Goto, Masayuki
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/10/11
Y1 - 2020/10/11
N2 - With the development of IoT technology, it has become possible to accumulate and regularly measure multidimensional time-series data. In this study, we focus on the usage of multidimensional time-series data from printer products' log data and propose a method for its analysis. In addition to the number of sheets printed by each customer, the log data includes various time-series information such as the amount of remaining toner, the number of stoppages that occur, and the activation times. To utilize these data for business purposes, it is desirable to construct a model for predicting future changes in use characteristics for each customer. In this study, we apply the random forest algorithm to predict such changes. However, if all measurable features of the problem are included, the model becomes complex and cannot be interpreted. Although the accuracy is relatively high if an appropriate learning algorithm is applied, the complex model tends to overfit the training data. In this paper, we propose a method to select the modeling features that can be interpreted by graph mining while maintaining accuracy. This would enable us to interpret the data at the field level and discover the hypotheses that are necessary for planned marketing policies. Finally, the proposed method is applied to real data and its efficacy is demonstrated.
AB - With the development of IoT technology, it has become possible to accumulate and regularly measure multidimensional time-series data. In this study, we focus on the usage of multidimensional time-series data from printer products' log data and propose a method for its analysis. In addition to the number of sheets printed by each customer, the log data includes various time-series information such as the amount of remaining toner, the number of stoppages that occur, and the activation times. To utilize these data for business purposes, it is desirable to construct a model for predicting future changes in use characteristics for each customer. In this study, we apply the random forest algorithm to predict such changes. However, if all measurable features of the problem are included, the model becomes complex and cannot be interpreted. Although the accuracy is relatively high if an appropriate learning algorithm is applied, the complex model tends to overfit the training data. In this paper, we propose a method to select the modeling features that can be interpreted by graph mining while maintaining accuracy. This would enable us to interpret the data at the field level and discover the hypotheses that are necessary for planned marketing policies. Finally, the proposed method is applied to real data and its efficacy is demonstrated.
KW - betweeness centrality
KW - change prediction
KW - customer analysis
KW - hypothesis discovery
KW - random forest
UR - http://www.scopus.com/inward/record.url?scp=85098871335&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85098871335&partnerID=8YFLogxK
U2 - 10.1109/SMC42975.2020.9282955
DO - 10.1109/SMC42975.2020.9282955
M3 - Conference contribution
AN - SCOPUS:85098871335
T3 - Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics
SP - 854
EP - 859
BT - 2020 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2020
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2020 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2020
Y2 - 11 October 2020 through 14 October 2020
ER -