TY - GEN
T1 - Automated web data mining using semantic analysis
AU - Dou, Wenxiang
AU - Hu, Jinglu
PY - 2012
Y1 - 2012
N2 - This paper presents an automated approach to extracting product data from commercial web pages. Our web mining method involves the following two phrases: First, it analyzes the data information located at the leaf node of DOM tree structure of the web page, generates the semantic information vector for other nodes of the DOM tree and find maximum repeat semantic vector pattern. Second, it identifies the product data region and data records, builds a product object template by using semantic tree matching technique and uses it to extract all product data from the web page. The main contribution of this study is in developing a fully automated approach to extract product data from the commercial sites without any user's assistance. Experiment results show that the proposed technique is highly effective.
AB - This paper presents an automated approach to extracting product data from commercial web pages. Our web mining method involves the following two phrases: First, it analyzes the data information located at the leaf node of DOM tree structure of the web page, generates the semantic information vector for other nodes of the DOM tree and find maximum repeat semantic vector pattern. Second, it identifies the product data region and data records, builds a product object template by using semantic tree matching technique and uses it to extract all product data from the web page. The main contribution of this study is in developing a fully automated approach to extract product data from the commercial sites without any user's assistance. Experiment results show that the proposed technique is highly effective.
KW - Product data mining
KW - Web data extraction
KW - Web mining
UR - http://www.scopus.com/inward/record.url?scp=84872694899&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84872694899&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-35527-1_45
DO - 10.1007/978-3-642-35527-1_45
M3 - Conference contribution
AN - SCOPUS:84872694899
SN - 9783642355264
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 539
EP - 551
BT - Advanced Data Mining and Applications - 8th International Conference, ADMA 2012, Proceedings
T2 - 8th International Conference on Advanced Data Mining and Applications, ADMA 2012
Y2 - 15 December 2012 through 18 December 2012
ER -