TY - GEN
T1 - Discovering similar malware samples using API call topics
AU - Fujino, Akinori
AU - Murakami, Junichi
AU - Mori, Tatsuya
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015/7/14
Y1 - 2015/7/14
N2 - To automate malware analysis, dynamic malware analysis systems have attracted increasing attention from both the industry and research communities. Of the various logs collected by such systems, the API call is a very promising source of information for characterizing malware behavior. This work aims to extract similar malware samples automatically using the concept of 'API call topics,' which represents a set of API calls that are intrinsic to a specific group of malware samples. We first convert Win32 API calls into 'API words.' We then apply non-negative matrix factorization (NMF) clustering analysis to the corpus of the extracted API words. NMF automatically generates the API call topics from the API words. The contributions of this work can be summarized as follows. We present an unsupervised approach to extract API call topics from a large corpus of API calls. Through analysis of the API call logs collected from thousands of malware samples, we demonstrate that the extracted API call topics can detect similar malware samples. The proposed approach is expected to be useful for automating the process of analyzing a huge volume of logs collected from dynamic malware analysis systems.
AB - To automate malware analysis, dynamic malware analysis systems have attracted increasing attention from both the industry and research communities. Of the various logs collected by such systems, the API call is a very promising source of information for characterizing malware behavior. This work aims to extract similar malware samples automatically using the concept of 'API call topics,' which represents a set of API calls that are intrinsic to a specific group of malware samples. We first convert Win32 API calls into 'API words.' We then apply non-negative matrix factorization (NMF) clustering analysis to the corpus of the extracted API words. NMF automatically generates the API call topics from the API words. The contributions of this work can be summarized as follows. We present an unsupervised approach to extract API call topics from a large corpus of API calls. Through analysis of the API call logs collected from thousands of malware samples, we demonstrate that the extracted API call topics can detect similar malware samples. The proposed approach is expected to be useful for automating the process of analyzing a huge volume of logs collected from dynamic malware analysis systems.
UR - http://www.scopus.com/inward/record.url?scp=84943196475&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84943196475&partnerID=8YFLogxK
U2 - 10.1109/CCNC.2015.7157960
DO - 10.1109/CCNC.2015.7157960
M3 - Conference contribution
AN - SCOPUS:84943196475
T3 - 2015 12th Annual IEEE Consumer Communications and Networking Conference, CCNC 2015
SP - 140
EP - 147
BT - 2015 12th Annual IEEE Consumer Communications and Networking Conference, CCNC 2015
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2015 12th Annual IEEE Consumer Communications and Networking Conference, CCNC 2015
Y2 - 9 January 2015 through 12 January 2015
ER -