TY - JOUR
T1 - CCFS
T2 - A Confidence-Based Cost-Effective Feature Selection Scheme for Healthcare Data Classification
AU - Chen, Yiyuan
AU - Wang, Yufeng
AU - Cao, Liang
AU - Jin, Qun
N1 - Funding Information:
The authors would like to thank the anonymous reviewers and editors for their valuable comments, which helped improve the quality of this paper greatly. This work was supported by NSFC under Grant 61801240, and QingLan Project of JiangSu Province.
Publisher Copyright:
© 2004-2012 IEEE.
PY - 2021/5/1
Y1 - 2021/5/1
N2 - Feature selection (FS) is one of the fundamental data processing techniques in various machine learning algorithms, especially for classification of healthcare data. However, it is a challenging issue due to the large search space. Binary Particle Swarm Optimization (BPSO) is an efficient evolutionary computation technique, and has been widely used in FS. In this paper, we proposed a Confidence-based and Cost-effective feature selection (CCFS) method using BPSO to improve the performance of healthcare data classification. Specifically, first, CCFS improves search effectiveness by developing a new updating mechanism that designs the feature confidence to explicitly take into account the fine-grained impact of each dimension in the particle on the classification performance. The feature confidence is composed of two measurements: the correlation between feature and categories, and historically selected frequency of each feature. Second, considering the fact that the acquisition costs of different features are naturally different, especially for medical data, and should be fully taken into account in practical applications, besides the classification performance, the feature cost and the feature reduction ratio are comprehensively incorporated into the design of fitness function. The proposed method has been verified in various UCI public datasets and compared with various benchmark schemes. The thoroughly experimental results show the effectiveness of the proposed method, in terms of accuracy and feature selection cost.
AB - Feature selection (FS) is one of the fundamental data processing techniques in various machine learning algorithms, especially for classification of healthcare data. However, it is a challenging issue due to the large search space. Binary Particle Swarm Optimization (BPSO) is an efficient evolutionary computation technique, and has been widely used in FS. In this paper, we proposed a Confidence-based and Cost-effective feature selection (CCFS) method using BPSO to improve the performance of healthcare data classification. Specifically, first, CCFS improves search effectiveness by developing a new updating mechanism that designs the feature confidence to explicitly take into account the fine-grained impact of each dimension in the particle on the classification performance. The feature confidence is composed of two measurements: the correlation between feature and categories, and historically selected frequency of each feature. Second, considering the fact that the acquisition costs of different features are naturally different, especially for medical data, and should be fully taken into account in practical applications, besides the classification performance, the feature cost and the feature reduction ratio are comprehensively incorporated into the design of fitness function. The proposed method has been verified in various UCI public datasets and compared with various benchmark schemes. The thoroughly experimental results show the effectiveness of the proposed method, in terms of accuracy and feature selection cost.
KW - Data classification
KW - binary particle swarm optimization
KW - feature selection
KW - swarm intelligence
UR - http://www.scopus.com/inward/record.url?scp=85107465324&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85107465324&partnerID=8YFLogxK
U2 - 10.1109/TCBB.2019.2903804
DO - 10.1109/TCBB.2019.2903804
M3 - Article
C2 - 30843850
AN - SCOPUS:85107465324
SN - 1545-5963
VL - 18
SP - 902
EP - 911
JO - IEEE/ACM Transactions on Computational Biology and Bioinformatics
JF - IEEE/ACM Transactions on Computational Biology and Bioinformatics
IS - 3
M1 - 8662586
ER -