TY - JOUR
T1 - Statistical estimation of the names of HTTPS servers with domain name graphs
AU - Mori, Tatsuya
AU - Inoue, Takeru
AU - Shimoda, Akihiro
AU - Sato, Kazumichi
AU - Harada, Shigeaki
AU - Ishibashi, Keisuke
AU - Goto, Shigeki
N1 - Funding Information:
We thank the workshop chairs of the 7th International Workshop on Traffic Monitoring and Analysis (TMA 2015) for providing us with the opportunity to publish the extended version of our work. A part of this work was supported by JSPS KAKENHI Grant number 25880020.
Publisher Copyright:
© 2016 Elsevier B.V.
PY - 2016
Y1 - 2016
N2 - Adoption of SSL/TLS to protect the privacy of web users has become increasingly common. In fact, as of September 2015, more than 68% of top-1M websites deploy SSL/TLS to encrypt their traffic. The transition from HTTP to HTTPS has brought a new challenge for network operators who need to understand the hostnames of encrypted web traffic for various reasons. To meet the challenge, this work develops a novel framework called SFMap, which estimates names of HTTPS servers by analyzing precedent DNS queries/responses in a statistical way. The SFMap framework introduces domain name graph, which can characterize highly dynamic and diverse nature of DNS mechanisms. Such complexity arises from the recent deployment and implementation of DNS ecosystems; i.e., canonical name tricks used by CDNs, the dynamic and diverse nature of DNS TTL settings, and incomplete and unpredictable measurements due to the existence of various DNS caching instances. First, we demonstrate that SFMap establishes good estimation accuracies and outperforms a state-of-the-art approach. We also aim to identify the optimized setting of the SFMap framework. Next, based on the preliminary analysis, we introduce techniques to make the SFMap framework scalable to large-scale traffic data. We validate the effectiveness of the approach using large-scale Internet traffic.
AB - Adoption of SSL/TLS to protect the privacy of web users has become increasingly common. In fact, as of September 2015, more than 68% of top-1M websites deploy SSL/TLS to encrypt their traffic. The transition from HTTP to HTTPS has brought a new challenge for network operators who need to understand the hostnames of encrypted web traffic for various reasons. To meet the challenge, this work develops a novel framework called SFMap, which estimates names of HTTPS servers by analyzing precedent DNS queries/responses in a statistical way. The SFMap framework introduces domain name graph, which can characterize highly dynamic and diverse nature of DNS mechanisms. Such complexity arises from the recent deployment and implementation of DNS ecosystems; i.e., canonical name tricks used by CDNs, the dynamic and diverse nature of DNS TTL settings, and incomplete and unpredictable measurements due to the existence of various DNS caching instances. First, we demonstrate that SFMap establishes good estimation accuracies and outperforms a state-of-the-art approach. We also aim to identify the optimized setting of the SFMap framework. Next, based on the preliminary analysis, we introduce techniques to make the SFMap framework scalable to large-scale traffic data. We validate the effectiveness of the approach using large-scale Internet traffic.
KW - DNS
KW - Graph
KW - SSL/TLS
KW - Traffic analysis
UR - http://www.scopus.com/inward/record.url?scp=84959520200&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84959520200&partnerID=8YFLogxK
U2 - 10.1016/j.comcom.2016.01.013
DO - 10.1016/j.comcom.2016.01.013
M3 - Article
AN - SCOPUS:84959520200
SN - 0140-3664
VL - 94
SP - 104
EP - 113
JO - Computer Communications
JF - Computer Communications
ER -