TY - GEN
T1 - Solving Feature Sparseness in Text Classification using Core-Periphery Decomposition
AU - Cui, Xia
AU - Kojaku, Sadamori
AU - Masuda, Naoki
AU - Bollegala, Danushka
N1 - Publisher Copyright:
© 2018 Association for Computational Linguistics.
PY - 2018
Y1 - 2018
N2 - Feature sparseness is a problem common to cross-domain and short-text classification tasks. To overcome this feature sparseness problem, we propose a novel method based on graph decomposition to find candidate features for expanding feature vectors. Specifically, we first create a feature-relatedness graph, which is subsequently decomposed into core-periphery (CP) pairs and use the peripheries as the expansion candidates of the cores. We expand both training and test instances using the computed related features and use them to train a text classifier. We observe that prioritising features that are common to both training and test instances as cores during the CP decomposition to further improve the accuracy of text classification. We evaluate the proposed CP-decomposition-based feature expansion method on benchmark datasets for cross-domain sentiment classification and short-text classification. Our experimental results show that the proposed method consistently outperforms all baselines on short-text classification tasks, and perform competitively with pivot-based cross-domain sentiment classification methods.
AB - Feature sparseness is a problem common to cross-domain and short-text classification tasks. To overcome this feature sparseness problem, we propose a novel method based on graph decomposition to find candidate features for expanding feature vectors. Specifically, we first create a feature-relatedness graph, which is subsequently decomposed into core-periphery (CP) pairs and use the peripheries as the expansion candidates of the cores. We expand both training and test instances using the computed related features and use them to train a text classifier. We observe that prioritising features that are common to both training and test instances as cores during the CP decomposition to further improve the accuracy of text classification. We evaluate the proposed CP-decomposition-based feature expansion method on benchmark datasets for cross-domain sentiment classification and short-text classification. Our experimental results show that the proposed method consistently outperforms all baselines on short-text classification tasks, and perform competitively with pivot-based cross-domain sentiment classification methods.
UR - http://www.scopus.com/inward/record.url?scp=85089733203&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85089733203&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85089733203
T3 - NAACL HLT 2018 - Lexical and Computational Semantics, SEM 2018, Proceedings of the 7th Conference
SP - 255
EP - 264
BT - NAACL HLT 2018 - Lexical and Computational Semantics, SEM 2018, Proceedings of the 7th Conference
A2 - Nissim, Malvina
A2 - Berant, Jonathan
A2 - Lenci, Alessandro
PB - Association for Computational Linguistics (ACL)
T2 - 7th Joint Conference on Lexical and Computational Semantics, SEM 2018, co-located with NAACL HLT 2018
Y2 - 5 June 2018 through 6 June 2018
ER -