TY - JOUR
T1 - Tracing CVE vulnerability information to capec attack patterns using natural language processing techniques
AU - Kanakogi, Kenta
AU - Washizaki, Hironori
AU - Fukazawa, Yoshiaki
AU - Ogata, Shinpei
AU - Okubo, Takao
AU - Kato, Takehisa
AU - Kanuka, Hideyuki
AU - Hazeyama, Atsuo
AU - Yoshioka, Nobukazu
N1 - Funding Information:
Funding: This research was supported by the SCAT Research Grant, the MEXT enPiT-Pro Smart SE: Smart Systems and Services innovative professional Education program, and the JST-Mirai Program grant number JPMJMI20B8.
Publisher Copyright:
© 2021 by the authors. Licensee MDPI, Basel, Switzerland.
PY - 2021/8
Y1 - 2021/8
N2 - For effective vulnerability management, vulnerability and attack information must be col-lected quickly and efficiently. A security knowledge repository can collect such information. The Common Vulnerabilities and Exposures (CVE) provides known vulnerabilities of products, while the Common Attack Pattern Enumeration and Classification (CAPEC) stores attack patterns, which are descriptions of common attributes and approaches employed by adversaries to exploit known weaknesses. Due to the fact that the information in these two repositories are not linked, identifying related CAPEC attack information from CVE vulnerability information is challenging. Currently, the related CAPEC-ID can be traced from the CVE-ID using Common Weakness Enumeration (CWE) in some but not all cases. Here, we propose a method to automatically trace the related CAPEC-IDs from CVE-ID using three similarity measures: TF–IDF, Universal Sentence Encoder (USE), and Sentence-BERT (SBERT). We prepared and used 58 CVE-IDs as test input data. Then, we tested whether we could trace CAPEC-IDs related to each of the 58 CVE-IDs. Additionally, we ex-perimentally confirm that TF–IDF is the best similarity measure, as it traced 48 of the 58 CVE-IDs to the related CAPEC-ID.
AB - For effective vulnerability management, vulnerability and attack information must be col-lected quickly and efficiently. A security knowledge repository can collect such information. The Common Vulnerabilities and Exposures (CVE) provides known vulnerabilities of products, while the Common Attack Pattern Enumeration and Classification (CAPEC) stores attack patterns, which are descriptions of common attributes and approaches employed by adversaries to exploit known weaknesses. Due to the fact that the information in these two repositories are not linked, identifying related CAPEC attack information from CVE vulnerability information is challenging. Currently, the related CAPEC-ID can be traced from the CVE-ID using Common Weakness Enumeration (CWE) in some but not all cases. Here, we propose a method to automatically trace the related CAPEC-IDs from CVE-ID using three similarity measures: TF–IDF, Universal Sentence Encoder (USE), and Sentence-BERT (SBERT). We prepared and used 58 CVE-IDs as test input data. Then, we tested whether we could trace CAPEC-IDs related to each of the 58 CVE-IDs. Additionally, we ex-perimentally confirm that TF–IDF is the best similarity measure, as it traced 48 of the 58 CVE-IDs to the related CAPEC-ID.
KW - CAPEC
KW - CVE
KW - Natural language processing
KW - Security knowledge repository
KW - Sentence embeddings
KW - Sentence-BERT
KW - TF–IDF
KW - Universal sentence encoder
UR - http://www.scopus.com/inward/record.url?scp=85111738611&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85111738611&partnerID=8YFLogxK
U2 - 10.3390/info12080298
DO - 10.3390/info12080298
M3 - Article
AN - SCOPUS:85111738611
SN - 2078-2489
VL - 12
JO - Information (Switzerland)
JF - Information (Switzerland)
IS - 8
M1 - 298
ER -