Automatically extracting personal name aliases from the web

Danushka Bollegala*, Taiki Honma, Yutaka Matsuo, Mitsuru Ishizuka

*この研究の対応する著者

研究成果: Conference contribution

5 被引用数 (Scopus)

抄録

Extracting aliases of an entity is important for various tasks such as identification of relations among entities, web search and entity disambiguation. To extract relations among entities properly, one must first identify those entities. We propose a novel approach to find aliases of a given name using automatically extracted lexical patterns. We exploit a set of known names and their aliases as training data and extract lexical patterns that convey information related to aliases of names from text snippets returned by a web search engine. The patterns are then used to find candidate aliases of a given name. We use anchor texts to design a word co-occurrence model and use it to define various ranking scores to measure the association between a name and a candidate alias. The ranking scores are integrated with page-count-based association measures using support vector machines to leverage a robust alias detection method. The proposed method outperforms numerous baselines and previous work on alias extraction on a dataset of personal names, achieving a statistically significant mean reciprocal rank of 0.6718. Experiments carried out using a dataset of location names and Japanese personal names suggest the possibility of extending the proposed method to extract aliases for different types of named entities and for other languages. Moreover, the aliases extracted using the proposed method improve recall by 20% in a relation-detection task.

本文言語English
ホスト出版物のタイトルLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
ページ77-88
ページ数12
5221 LNAI
DOI
出版ステータスPublished - 2008
外部発表はい
イベント6th International Conference on Natural Language Processing, GoTAL 2008 - Gothenburg
継続期間: 2008 8月 252008 8月 27

出版物シリーズ

名前Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
5221 LNAI
ISSN(印刷版)03029743
ISSN(電子版)16113349

Other

Other6th International Conference on Natural Language Processing, GoTAL 2008
CityGothenburg
Period08/8/2508/8/27

ASJC Scopus subject areas

  • コンピュータ サイエンス(全般)
  • 理論的コンピュータサイエンス

フィンガープリント

「Automatically extracting personal name aliases from the web」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル