Tracing CVE vulnerability information to capec attack patterns using natural language processing techniques

Kenta Kanakogi*, Hironori Washizaki, Yoshiaki Fukazawa, Shinpei Ogata, Takao Okubo, Takehisa Kato, Hideyuki Kanuka, Atsuo Hazeyama, Nobukazu Yoshioka

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

10 Citations (Scopus)


For effective vulnerability management, vulnerability and attack information must be col-lected quickly and efficiently. A security knowledge repository can collect such information. The Common Vulnerabilities and Exposures (CVE) provides known vulnerabilities of products, while the Common Attack Pattern Enumeration and Classification (CAPEC) stores attack patterns, which are descriptions of common attributes and approaches employed by adversaries to exploit known weaknesses. Due to the fact that the information in these two repositories are not linked, identifying related CAPEC attack information from CVE vulnerability information is challenging. Currently, the related CAPEC-ID can be traced from the CVE-ID using Common Weakness Enumeration (CWE) in some but not all cases. Here, we propose a method to automatically trace the related CAPEC-IDs from CVE-ID using three similarity measures: TF–IDF, Universal Sentence Encoder (USE), and Sentence-BERT (SBERT). We prepared and used 58 CVE-IDs as test input data. Then, we tested whether we could trace CAPEC-IDs related to each of the 58 CVE-IDs. Additionally, we ex-perimentally confirm that TF–IDF is the best similarity measure, as it traced 48 of the 58 CVE-IDs to the related CAPEC-ID.

Original languageEnglish
Article number298
JournalInformation (Switzerland)
Issue number8
Publication statusPublished - 2021 Aug


  • CVE
  • Natural language processing
  • Security knowledge repository
  • Sentence embeddings
  • Sentence-BERT
  • TF–IDF
  • Universal sentence encoder

ASJC Scopus subject areas

  • Information Systems

Cite this