TY - JOUR
T1 - Consideration to apply the Mahalanobis-Taguchi method to small sample data
AU - Ohkubo, Masato
AU - Nagata, Yasushi
PY - 2015
Y1 - 2015
N2 - The Mahalanobis-Taguchi (MT) method is a standard method of multivariate analysis for detecting anomalies or recognizing patterns. A number of case studies using the MT method have been reported. However, good performance is only obtained when a sufficient number of samples can be ensured; if the number of samples is insufficient, this method has a large probability bias. In this paper, we first analyze the existing measures of methods, in which performing dimension reduction, such as using variable selection, is common, and show that there are some problems with testing for unknown data. Secondly, we propose two analytical procedures for small sample data in which the detection capability with respect to unknown data is taken into account. In these proposed procedures, when the number of data samples is small compared to the dimensions of the variables, the detection measure in the MT method is replaced by a measure derived through approximating correlation matrices based on probabilistic principal component analysis (PPCA) or by introducing ensemble learning. Finally, based on raw data analysis using the KDDCup99 dataset and simulation results, we consider how the proposed procedures should be applied when multicollinearity occurs and which of these two procedures should be applied according to the data pattern.
AB - The Mahalanobis-Taguchi (MT) method is a standard method of multivariate analysis for detecting anomalies or recognizing patterns. A number of case studies using the MT method have been reported. However, good performance is only obtained when a sufficient number of samples can be ensured; if the number of samples is insufficient, this method has a large probability bias. In this paper, we first analyze the existing measures of methods, in which performing dimension reduction, such as using variable selection, is common, and show that there are some problems with testing for unknown data. Secondly, we propose two analytical procedures for small sample data in which the detection capability with respect to unknown data is taken into account. In these proposed procedures, when the number of data samples is small compared to the dimensions of the variables, the detection measure in the MT method is replaced by a measure derived through approximating correlation matrices based on probabilistic principal component analysis (PPCA) or by introducing ensemble learning. Finally, based on raw data analysis using the KDDCup99 dataset and simulation results, we consider how the proposed procedures should be applied when multicollinearity occurs and which of these two procedures should be applied according to the data pattern.
KW - Ensemble learning
KW - MT method
KW - Probabilistic principal component analysis
KW - Taguchi method
UR - http://www.scopus.com/inward/record.url?scp=84931080651&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84931080651&partnerID=8YFLogxK
M3 - Article
AN - SCOPUS:84931080651
SN - 0386-4812
VL - 66
SP - 30
EP - 38
JO - Journal of Japan Industrial Management Association
JF - Journal of Japan Industrial Management Association
IS - 1
ER -