TY - GEN
T1 - Inferring popularity of domain names with DNS traffic
T2 - 58th IEEE Global Communications Conference, GLOBECOM 2015
AU - Shimoda, Akihiro
AU - Ishibashi, Keisuke
AU - Sato, Kazumichi
AU - Tsujino, Masayuki
AU - Inoue, Takeru
AU - Shimura, Masaki
AU - Takebe, Takanori
AU - Takahashi, Kazuki
AU - Mori, Tatsuya
AU - Goto, Shigeki
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015
Y1 - 2015
N2 - Popularity ranking of Internet services is an important metric for network operators, because it enables mid- to-long term planning of their network facilities and root cause analysis for unexpected traffic. The service-oriented traffic monitoring is much helpful to infer the popularity, hence it has been gathering much attention from both researchers and practitioners. Lately, service identification of a given flow has become very difficult due to the rapid growth of CDNs and/or encrypted traffic, while some research works employed preceding DNS traffic as a hint. However, because of its cache mechanism, the DNS message count deviates from the actual number of flows, which can greatly degrade the ranking reliability. We propose a theoretical model for inferring the user's number of accesses per domain name by exploiting the characteristics of the DNS message count. To the best of our knowledge, this paper is the first attempt to formulate the effect of user's stub resolvers; previous studies were focused on analyzing the effect of cache servers. We evaluated the precision of our model with a real dataset of traffic of thousands of users. By analyzing the top-50 domain names by the number of users, we can infer the number of flows within a 24% error rate on average in 42 out of 50 FQDNs.
AB - Popularity ranking of Internet services is an important metric for network operators, because it enables mid- to-long term planning of their network facilities and root cause analysis for unexpected traffic. The service-oriented traffic monitoring is much helpful to infer the popularity, hence it has been gathering much attention from both researchers and practitioners. Lately, service identification of a given flow has become very difficult due to the rapid growth of CDNs and/or encrypted traffic, while some research works employed preceding DNS traffic as a hint. However, because of its cache mechanism, the DNS message count deviates from the actual number of flows, which can greatly degrade the ranking reliability. We propose a theoretical model for inferring the user's number of accesses per domain name by exploiting the characteristics of the DNS message count. To the best of our knowledge, this paper is the first attempt to formulate the effect of user's stub resolvers; previous studies were focused on analyzing the effect of cache servers. We evaluated the precision of our model with a real dataset of traffic of thousands of users. By analyzing the top-50 domain names by the number of users, we can infer the number of flows within a 24% error rate on average in 42 out of 50 FQDNs.
UR - http://www.scopus.com/inward/record.url?scp=84964896480&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84964896480&partnerID=8YFLogxK
U2 - 10.1109/GLOCOM.2014.7417638
DO - 10.1109/GLOCOM.2014.7417638
M3 - Conference contribution
AN - SCOPUS:84964896480
T3 - 2015 IEEE Global Communications Conference, GLOBECOM 2015
BT - 2015 IEEE Global Communications Conference, GLOBECOM 2015
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 6 December 2015 through 10 December 2015
ER -