TY - GEN
T1 - Automatic labeling of the elements of a vulnerability report CVE with NLP
AU - Sumoto, Kensuke
AU - Kanakogi, Kenta
AU - Washizaki, Hironori
AU - Tsuda, Naohiko
AU - Yoshioka, Nobukazu
AU - Fukazawa, Yoshiaki
AU - Kanuka, Hideyuki
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Common Vulnerabilities and Exposures (CVE) databases contain information about vulnerabilities of software products and source code. If individual elements of CVE descriptions can be extracted and structured, then the data can be used to search and analyze CVE descriptions. Herein we propose a method to label each element in CVE descriptions by applying Named Entity Recognition (NER). For NER, we used BERT, a transformer-based natural language processing model. Using NER with machine learning can label information from CVE descriptions even if there are some distortions in the data. An experiment involving manually prepared label information for 1000 CVE descriptions shows that the labeling accuracy of the proposed method is about 0.81 for precision and about 0.89 for recall. In addition, we devise a way to train the data by dividing it into labels. Our proposed method can be used to label each element automatically from CVE descriptions.
AB - Common Vulnerabilities and Exposures (CVE) databases contain information about vulnerabilities of software products and source code. If individual elements of CVE descriptions can be extracted and structured, then the data can be used to search and analyze CVE descriptions. Herein we propose a method to label each element in CVE descriptions by applying Named Entity Recognition (NER). For NER, we used BERT, a transformer-based natural language processing model. Using NER with machine learning can label information from CVE descriptions even if there are some distortions in the data. An experiment involving manually prepared label information for 1000 CVE descriptions shows that the labeling accuracy of the proposed method is about 0.81 for precision and about 0.89 for recall. In addition, we devise a way to train the data by dividing it into labels. Our proposed method can be used to label each element automatically from CVE descriptions.
KW - BERT
KW - CVE
KW - Technological
KW - named entity recognition
KW - natural language processing
KW - security knowledge repository
UR - http://www.scopus.com/inward/record.url?scp=85139040087&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85139040087&partnerID=8YFLogxK
U2 - 10.1109/IRI54793.2022.00045
DO - 10.1109/IRI54793.2022.00045
M3 - Conference contribution
AN - SCOPUS:85139040087
T3 - Proceedings - 2022 IEEE 23rd International Conference on Information Reuse and Integration for Data Science, IRI 2022
SP - 164
EP - 165
BT - Proceedings - 2022 IEEE 23rd International Conference on Information Reuse and Integration for Data Science, IRI 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 23rd IEEE International Conference on Information Reuse and Integration for Data Science, IRI 2022
Y2 - 9 August 2022 through 11 August 2022
ER -