TY - JOUR
T1 - Accurate automated clustering of two-dimensional data for single-nucleotide polymorphism genotyping by a combination of clustering methods
T2 - Evaluation by large-scale real data
AU - Takitoh, Shuichi
AU - Fujii, Shogo
AU - Mase, Yoichi
AU - Takasaki, Junichi
AU - Yamazaki, Toshimasa
AU - Ohnishi, Yozo
AU - Yanagisawa, Masao
AU - Nakamura, Yusuke
AU - Kamatani, Naoyuki
N1 - Funding Information:
The authors would like to acknowledge S. Shibata, S. Kato, K. Nakazono, N. Miyagawa and H. Higuchi for their excellent suggestions and comments. This study was supported by Grants-in-Aid from the Ministry of Education, Culture, Sports, Science and Technology of Japan.
PY - 2007/2/15
Y1 - 2007/2/15
N2 - Motivation: The Invader assay is a fluorescence-based high-throughput genotyping technology. If the output data from the Invader assay were classified automatically, then genotypes for individuals would be determined efficiently. However, existing classification methods do not necessarily yield results with the same accuracy as can be achieved by technicians. Our clustering algorithm, Genocluster, is intended to increase the proportion of data points that need not be manually corrected by technicians. Results: Genocluster worked well even when the number of clusters was unknown in advance and when there were only a few points in a cluster. The use of Genocluster enabled us to achieve an acceptance rate (proportion of assay results that did not need to be corrected by expert technicians) of 84.4% and a proportion of uncorrected points of 95.8%, as determined using the data from over 31 million points.
AB - Motivation: The Invader assay is a fluorescence-based high-throughput genotyping technology. If the output data from the Invader assay were classified automatically, then genotypes for individuals would be determined efficiently. However, existing classification methods do not necessarily yield results with the same accuracy as can be achieved by technicians. Our clustering algorithm, Genocluster, is intended to increase the proportion of data points that need not be manually corrected by technicians. Results: Genocluster worked well even when the number of clusters was unknown in advance and when there were only a few points in a cluster. The use of Genocluster enabled us to achieve an acceptance rate (proportion of assay results that did not need to be corrected by expert technicians) of 84.4% and a proportion of uncorrected points of 95.8%, as determined using the data from over 31 million points.
UR - http://www.scopus.com/inward/record.url?scp=33847335428&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33847335428&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/btl133
DO - 10.1093/bioinformatics/btl133
M3 - Article
C2 - 17301273
AN - SCOPUS:33847335428
SN - 1367-4803
VL - 23
SP - 408
EP - 413
JO - Bioinformatics
JF - Bioinformatics
IS - 4
ER -