Automatic annotation of ambiguous personal names on the web

Danushka Bollegala*, Yutaka Matsuo, Mitsuru Ishizuka

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

4 Citations (Scopus)

Abstract

Personal name disambiguation is an important task in social network extraction, evaluation and integration of ontologies, information retrieval, cross-document coreference resolution and word sense disambiguation. We propose an unsupervised method to automatically annotate people with ambiguous names on the Web using automatically extracted keywords. Given an ambiguous personal name, first, we download text snippets for the given name from a Web search engine. We then represent each instance of the ambiguous name by a term-entity model (TEM), a model that we propose to represent the Web appearance of an individual. A TEM of a person captures named entities and attribute values that are useful to disambiguate that person from his or her namesakes (i.e., different people who share the same name). We then use group average agglomerative clustering to identify the instances of an ambiguous name that belong to the same person. Ideally, each cluster must represent a different namesake. However, in practice it is not possible to know the number of namesakes for a given ambiguous personal name in advance. To circumvent this problem, we propose a novel normalized cuts-based cluster stopping criterion to determine the different people on the Web for a given ambiguous name. Finally, we annotate each person with an ambiguous name using keywords selected from the clusters. We evaluate the proposed method on a data set of over 2500 documents covering 200 different people for 20 ambiguous names. Experimental results show that the proposed method outperforms numerous baselines and previously proposed name disambiguation methods. Moreover, the extracted keywords reduce ambiguity of a name in an information retrieval task, which underscores the usefulness of the proposed method in real-world scenarios.

Original languageEnglish
Pages (from-to)398-425
Number of pages28
JournalComputational Intelligence
Volume28
Issue number3
DOIs
Publication statusPublished - 2012 Aug
Externally publishedYes

Keywords

  • automatic annotation
  • clustering
  • name disambiguation
  • Web mining

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computational Mathematics

Fingerprint

Dive into the research topics of 'Automatic annotation of ambiguous personal names on the web'. Together they form a unique fingerprint.

Cite this