抄録
Word clustering is important for automatic thesaurus construction, text classification, and word sense disambiguation. Recently, several studies have reported using the web as a corpus. This paper proposes an unsupervised algorithm for word clustering based on a word similarity measure by web counts. Each pair of words is queried to a search engine, which produces a co-occurrence matrix. By calculating the similarity of words, a word cooccurrence graph is obtained. A new kind of graph clustering algorithm called Newman clustering is applied for efficiently identifying word clusters. Evaluations are made on two sets of word groups derived from a web directory and WordNet.
本文言語 | English |
---|---|
ホスト出版物のタイトル | COLING/ACL 2006 - EMNLP 2006: 2006 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference |
ページ | 542-550 |
ページ数 | 9 |
出版ステータス | Published - 2006 |
外部発表 | はい |
イベント | 11th Conference on Empirical Methods in Natural Language Proceessing, EMNLP 2006, Held in Conjunction with COLING/ACL 2006 - Sydney, NSW 継続期間: 2006 7月 22 → 2006 7月 23 |
Other
Other | 11th Conference on Empirical Methods in Natural Language Proceessing, EMNLP 2006, Held in Conjunction with COLING/ACL 2006 |
---|---|
City | Sydney, NSW |
Period | 06/7/22 → 06/7/23 |
ASJC Scopus subject areas
- 計算理論と計算数学
- コンピュータ サイエンスの応用
- 情報システム