TY - JOUR
T1 - Detecting malware-infected devices using the HTTP header patterns
AU - Mizuno, Sho
AU - Hatada, Mitsuhiro
AU - Mori, Tatsuya
AU - Goto, Shigeki
N1 - Funding Information:
We thank Dr. Tatsuaki Kimura for his valuable comments on the automatic template generation algorithm. A part of this work was supported by JSPS Grant-in-Aid for Scientific Research B, Grant Number 16H02832.
Publisher Copyright:
© 2018 The Institute of Electronics, Information and Communication Engineers.
PY - 2018/5
Y1 - 2018/5
N2 - Damage caused by malware has become a serious problem. The recent rise in the spread of evasive malware has made it difficult to detect it at the pre-infection timing. Malware detection at post-infection timing is a promising approach that fulfills this gap. Given this background, this work aims to identify likely malware-infected devices from the measurement of Internet traffic. The advantage of the traffic-measurementbased approach is that it enables us to monitor a large number of endhosts. If we find an endhost as a source of malicious traffic, the endhost is likely a malware-infected device. Since the majority of malware today makes use of the web as a means to communicate with the C&C servers that reside on the external network, we leverage information recorded in the HTTP headers to discriminate between malicious and benign traffic. To make our approach scalable and robust, we develop the automatic template generation scheme that drastically reduces the amount of information to be kept while achieving the high accuracy of classification; since it does not make use of any domain knowledge, the approach should be robust against changes of malware. We apply several classifiers, which include machine learning algorithms, to the extracted templates and classify traffic into two categories: malicious and benign. Our extensive experiments demonstrate that our approach discriminates between malicious and benign traffic with up to 97.1% precision while maintaining the false positive rate below 1.0%.
AB - Damage caused by malware has become a serious problem. The recent rise in the spread of evasive malware has made it difficult to detect it at the pre-infection timing. Malware detection at post-infection timing is a promising approach that fulfills this gap. Given this background, this work aims to identify likely malware-infected devices from the measurement of Internet traffic. The advantage of the traffic-measurementbased approach is that it enables us to monitor a large number of endhosts. If we find an endhost as a source of malicious traffic, the endhost is likely a malware-infected device. Since the majority of malware today makes use of the web as a means to communicate with the C&C servers that reside on the external network, we leverage information recorded in the HTTP headers to discriminate between malicious and benign traffic. To make our approach scalable and robust, we develop the automatic template generation scheme that drastically reduces the amount of information to be kept while achieving the high accuracy of classification; since it does not make use of any domain knowledge, the approach should be robust against changes of malware. We apply several classifiers, which include machine learning algorithms, to the extracted templates and classify traffic into two categories: malicious and benign. Our extensive experiments demonstrate that our approach discriminates between malicious and benign traffic with up to 97.1% precision while maintaining the false positive rate below 1.0%.
KW - Automatic template generation
KW - Botnet detection
KW - HTTP header
KW - Malicious traffic
UR - http://www.scopus.com/inward/record.url?scp=85046298316&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85046298316&partnerID=8YFLogxK
U2 - 10.1587/transinf.2017EDP7294
DO - 10.1587/transinf.2017EDP7294
M3 - Article
AN - SCOPUS:85046298316
SN - 0916-8532
VL - E101D
SP - 1370
EP - 1379
JO - IEICE Transactions on Information and Systems
JF - IEICE Transactions on Information and Systems
IS - 5
ER -