TY - GEN
T1 - Enriching multilingual language resources by discovering missing cross-language links in Wikipedia
AU - Oh, Jong Hoon
AU - Kawahara, Daisuke
AU - Uchimoto, Kiyotaka
AU - Kazama, Jun'ichi
AU - Torisawa, Kentaro
PY - 2008
Y1 - 2008
N2 - We present a novel method for discovering missing crosslanguage links between English and Japanese Wikipedia articles. We collect candidates of missing cross-language links - a pair of English and Japanese Wikipedia articles, which could be connected by cross-language links. Then we select the correct cross-language links among the candidates by using a classifier trained with various types of features. Our method has three desirable characteristics for discovering missing links. First, our method can discover cross-language links with high accuracy (92% precision with 78% recall rates). Second, the features used in a classifier are language-independent. Third, without relying on any external knowledge, we generate the features based on resources automatically obtained from Wikipedia. In this work, we discover approximately 105 missing crosslanguage links from Wikipedia, which are almost two-thirds as many as the existing cross-language links in Wikipedia.
AB - We present a novel method for discovering missing crosslanguage links between English and Japanese Wikipedia articles. We collect candidates of missing cross-language links - a pair of English and Japanese Wikipedia articles, which could be connected by cross-language links. Then we select the correct cross-language links among the candidates by using a classifier trained with various types of features. Our method has three desirable characteristics for discovering missing links. First, our method can discover cross-language links with high accuracy (92% precision with 78% recall rates). Second, the features used in a classifier are language-independent. Third, without relying on any external knowledge, we generate the features based on resources automatically obtained from Wikipedia. In this work, we discover approximately 105 missing crosslanguage links from Wikipedia, which are almost two-thirds as many as the existing cross-language links in Wikipedia.
UR - http://www.scopus.com/inward/record.url?scp=62949243450&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=62949243450&partnerID=8YFLogxK
U2 - 10.1109/WIIAT.2008.317
DO - 10.1109/WIIAT.2008.317
M3 - Conference contribution
AN - SCOPUS:62949243450
SN - 9780769534961
T3 - Proceedings - 2008 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2008
SP - 322
EP - 328
BT - Proceedings - 2008 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2008
T2 - 2008 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2008
Y2 - 9 December 2008 through 12 December 2008
ER -