TY - GEN
T1 - Genetic network programming with parallel processing for association rule mining in large and dense databases
AU - Gonzales, Eloy
AU - Shimada, Kaoru
AU - Mabu, Shingo
AU - Hirasawa, Kotaro
AU - Hu, Jinglu
PY - 2007
Y1 - 2007
N2 - Several methods of extracting association rules have been reported. A new evolutionary computation method named Genetic Network Programming (GNP) has also been developed recently and its efectiveness is shown for small datasets. However, it has not been tested for large datasets, particularly in datasets with a large number of attributes. The aim of this paper is to extract association rules from large and dense datasets using GNP considering a real world database with a huge number of attributes. We propose a new method where a large database is divided into many small datasets, then each GNP deals with one dataset having attributes with appropiate size, which was selected randomly from a large dataset and generated genetically. These GNPs are processed in parallel. We then propose some new genetic operations to improve the number of rules extracted and their quality as well. The proposed method improves remarkably on simulations. Fig. 1 shows the architecture of the proposed method. We use the CLIENT/SERVER model. CLIENT side carries out preprocessing of large database, assignment of files to each server, rule checking, and genetic operations on files. SERVER side carries out processing of each file using conventional GNP based mining method independently. The features and advantages of the proposed method are the following: Rule extraction is done in parallel. Each file generates its local pool of the rules. Files or datasets are treated as individuals in order to do new genetic operations over them and improve the rule extraction. Extracted rules are stored in a global pool. The rules are verified to avoid redundancy among them and it is assured that only new rules are stored.
AB - Several methods of extracting association rules have been reported. A new evolutionary computation method named Genetic Network Programming (GNP) has also been developed recently and its efectiveness is shown for small datasets. However, it has not been tested for large datasets, particularly in datasets with a large number of attributes. The aim of this paper is to extract association rules from large and dense datasets using GNP considering a real world database with a huge number of attributes. We propose a new method where a large database is divided into many small datasets, then each GNP deals with one dataset having attributes with appropiate size, which was selected randomly from a large dataset and generated genetically. These GNPs are processed in parallel. We then propose some new genetic operations to improve the number of rules extracted and their quality as well. The proposed method improves remarkably on simulations. Fig. 1 shows the architecture of the proposed method. We use the CLIENT/SERVER model. CLIENT side carries out preprocessing of large database, assignment of files to each server, rule checking, and genetic operations on files. SERVER side carries out processing of each file using conventional GNP based mining method independently. The features and advantages of the proposed method are the following: Rule extraction is done in parallel. Each file generates its local pool of the rules. Files or datasets are treated as individuals in order to do new genetic operations over them and improve the rule extraction. Extracted rules are stored in a global pool. The rules are verified to avoid redundancy among them and it is assured that only new rules are stored.
KW - Association rules
KW - Genetic network programming
KW - Parallel processing
UR - http://www.scopus.com/inward/record.url?scp=34548063771&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=34548063771&partnerID=8YFLogxK
U2 - 10.1145/1276958.1277241
DO - 10.1145/1276958.1277241
M3 - Conference contribution
AN - SCOPUS:34548063771
SN - 1595936971
SN - 9781595936974
T3 - Proceedings of GECCO 2007: Genetic and Evolutionary Computation Conference
SP - 1512
BT - Proceedings of GECCO 2007
T2 - 9th Annual Genetic and Evolutionary Computation Conference, GECCO 2007
Y2 - 7 July 2007 through 11 July 2007
ER -