A study of Bayesian clustering of a document set based on GA

Keiko Aoki, Kazunori Matsumoto, Keiichiro Hoashi, Kazuo Hashimoto

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper, we propose new approximate clustering algorithm that improves the precision of a top-down clustering. Top-down clustering is proposed to improve the clustering speed by Iwayama et al, where the cluster tree is generated by sampling some documents, making a cluster from these, assigning other documents to the nearest node and if the number of assigned documents is large, continuing sampling and clustering from top to down. To improve precision of the top-down clustering method, we propose selecting documents by applying a GA to decide a quasi-optimum layer and using a MDL criteria for evaluating the layer structure of a cluster tree.

Original languageEnglish
Title of host publicationSimulated Evolution and Learning - 2nd Asia-Pacific Conference on Simulated Evolution and Learning, SEAL 1998, Selected Papers
EditorsBob McKay, Xin Yao, Charles S. Newton, Jong-Hwan Kim, Takeshi Furuhashi
PublisherSpringer Verlag
Pages260-267
Number of pages8
ISBN (Print)3540659072, 9783540659075
DOIs
Publication statusPublished - 1999
Externally publishedYes
Event2nd Asia-Pacific Conference on Simulated Evolution and Learning, SEAL 1998 - Canberra, Australia
Duration: 1998 Nov 241998 Nov 27

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume1585
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other2nd Asia-Pacific Conference on Simulated Evolution and Learning, SEAL 1998
Country/TerritoryAustralia
CityCanberra
Period98/11/2498/11/27

Keywords

  • Beysian clustering
  • Document retrieval
  • Genetic algorithm
  • Minimum description length criteria

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'A study of Bayesian clustering of a document set based on GA'. Together they form a unique fingerprint.

Cite this