TY - JOUR
T1 - A computer-based method of selecting clones for a full-length cDNA project
T2 - Simultaneous collection of negligibly redundant and variant cDNAs
AU - Osato, Naoki
AU - Itoh, Masayoshi
AU - Konno, Hideaki
AU - Kondo, Shinji
AU - Shibata, Kazuhiro
AU - Carninci, Piero
AU - Shiraki, Toshiyuki
AU - Shinagawa, Akira
AU - Arakawa, Takahiro
AU - Kikuchi, Shoshi
AU - Sato, Kouji
AU - Kawai, Jun
AU - Hayashizaki, Yoshihide
PY - 2002
Y1 - 2002
N2 - We describe a computer-based method that selects representative clones for full-length sequencing in a full-length cDNA project. Our method classifies end sequences using two kinds of criteria, grouping, and clustering. Grouping places together variant cDNAs, family genes, and cDNAs with sequencing errors. Clustering separates those cDNA clones into distinct clusters. The full-length sequences of the clones selected by grouping are determined preferentially, and then the sequences selected by clustering are determined. Grouping reduced the number of rice cDNA clones for full-length sequencing to 21% and mouse cDNA clones to 25%. Rice full-length sequences selected by grouping showed a 1.07-fold redundancy. Mouse full-length sequences showed a 1.04-fold redundancy, which can be reduced by ∼30% from the selection using our previous method. To estimate the coverage of unique genes, we used FANTOM (Functional Annotation of RIKEN Mouse cDNA Clones) clusters (the RIKEN Genome Exploration Research Group 2001). Grouping covered almost all unique genes (93% of FANTOM clusters), and clustering covered all genes. Therefore, our method is useful for the selection of appropriate representative clones for full-length sequencing, thereby greatly reducing the cost, labor, and time necessary for this process.
AB - We describe a computer-based method that selects representative clones for full-length sequencing in a full-length cDNA project. Our method classifies end sequences using two kinds of criteria, grouping, and clustering. Grouping places together variant cDNAs, family genes, and cDNAs with sequencing errors. Clustering separates those cDNA clones into distinct clusters. The full-length sequences of the clones selected by grouping are determined preferentially, and then the sequences selected by clustering are determined. Grouping reduced the number of rice cDNA clones for full-length sequencing to 21% and mouse cDNA clones to 25%. Rice full-length sequences selected by grouping showed a 1.07-fold redundancy. Mouse full-length sequences showed a 1.04-fold redundancy, which can be reduced by ∼30% from the selection using our previous method. To estimate the coverage of unique genes, we used FANTOM (Functional Annotation of RIKEN Mouse cDNA Clones) clusters (the RIKEN Genome Exploration Research Group 2001). Grouping covered almost all unique genes (93% of FANTOM clusters), and clustering covered all genes. Therefore, our method is useful for the selection of appropriate representative clones for full-length sequencing, thereby greatly reducing the cost, labor, and time necessary for this process.
UR - http://www.scopus.com/inward/record.url?scp=18444387124&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=18444387124&partnerID=8YFLogxK
U2 - 10.1101/gr.75202. Article published online before print in June 2002
DO - 10.1101/gr.75202. Article published online before print in June 2002
M3 - Article
C2 - 12097351
AN - SCOPUS:18444387124
SN - 1088-9051
VL - 12
SP - 1127
EP - 1134
JO - Genome Research
JF - Genome Research
IS - 7
ER -