TY - GEN
T1 - Improving Text Classification Using Knowledge in Labels
AU - Zhang, Cheng
AU - Yamana, Hayato
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021/3/5
Y1 - 2021/3/5
N2 - Various algorithms and models have been proposed to address text classification tasks; however, they rarely consider incorporating the additional knowledge hidden in class labels. We argue that hidden information in class labels leads to better classification accuracy. In this study, instead of encoding the labels into numerical values, we incorporated the knowledge in the labels into the original model without changing the model architecture. We combined the output of an original classification model with the relatedness calculated based on the embeddings of a sequence and a keyword set. A keyword set is a word set to represent knowledge in the labels. Usually, it is generated from the classes while it could also be customized by the users. The experimental results show that our proposed method achieved statistically significant improvements in text classification tasks. The source code and experimental details of this study can be found on Github11https://github.com/HeroadZ/KiL.
AB - Various algorithms and models have been proposed to address text classification tasks; however, they rarely consider incorporating the additional knowledge hidden in class labels. We argue that hidden information in class labels leads to better classification accuracy. In this study, instead of encoding the labels into numerical values, we incorporated the knowledge in the labels into the original model without changing the model architecture. We combined the output of an original classification model with the relatedness calculated based on the embeddings of a sequence and a keyword set. A keyword set is a word set to represent knowledge in the labels. Usually, it is generated from the classes while it could also be customized by the users. The experimental results show that our proposed method achieved statistically significant improvements in text classification tasks. The source code and experimental details of this study can be found on Github11https://github.com/HeroadZ/KiL.
KW - bert
KW - deep learning
KW - natural language processing
KW - text classification
KW - text mining
UR - http://www.scopus.com/inward/record.url?scp=85105344639&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85105344639&partnerID=8YFLogxK
U2 - 10.1109/ICBDA51983.2021.9403092
DO - 10.1109/ICBDA51983.2021.9403092
M3 - Conference contribution
AN - SCOPUS:85105344639
T3 - 2021 IEEE 6th International Conference on Big Data Analytics, ICBDA 2021
SP - 193
EP - 197
BT - 2021 IEEE 6th International Conference on Big Data Analytics, ICBDA 2021
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 6th IEEE International Conference on Big Data Analytics, ICBDA 2021
Y2 - 5 March 2021 through 8 March 2021
ER -