TY - JOUR
T1 - A sequence walking system for genome analysis
AU - Miura, Teruhisa
AU - Takase, Toshirou
AU - Ishida, Toru
PY - 2003/1/1
Y1 - 2003/1/1
N2 - With the development of the human genome analysis project, it is becoming possible to utilize large-scale genome sequence data. One genome analysis method based on large-scale sequence data is genome sequence walking. Applying sequence walking to the segment sequence database, it is possible to estimate the whole sequence of the gene to which the query sequence belongs by using the gene segment. By sequence walking, the researcher can estimate the genome sequence without going through biological experiments. This saves time and expense in sequence determination. Sequence walking has been performed using the well-known BLAST. BLAST, however, is a tool based on similarity search, and is not adequate in sequence walking in which the same gene segments are connected, both from the viewpoint of efficiency and from the viewpoint of accuracy. In this study, it is shown that genome sequence walking is not a problem of similarity search, but is a string matching problem permitting error. A system dedicated to sequence walking is constructed by improving the string matching algorithm, which is more suited to sequence walking. The result has been publicized on the WWW. The proposed sequence walking system can realize sequence walking that is faster and more accurate than the conventional sequence walking by BLAST, thus reducing the burden on the researcher.
AB - With the development of the human genome analysis project, it is becoming possible to utilize large-scale genome sequence data. One genome analysis method based on large-scale sequence data is genome sequence walking. Applying sequence walking to the segment sequence database, it is possible to estimate the whole sequence of the gene to which the query sequence belongs by using the gene segment. By sequence walking, the researcher can estimate the genome sequence without going through biological experiments. This saves time and expense in sequence determination. Sequence walking has been performed using the well-known BLAST. BLAST, however, is a tool based on similarity search, and is not adequate in sequence walking in which the same gene segments are connected, both from the viewpoint of efficiency and from the viewpoint of accuracy. In this study, it is shown that genome sequence walking is not a problem of similarity search, but is a string matching problem permitting error. A system dedicated to sequence walking is constructed by improving the string matching algorithm, which is more suited to sequence walking. The result has been publicized on the WWW. The proposed sequence walking system can realize sequence walking that is faster and more accurate than the conventional sequence walking by BLAST, thus reducing the burden on the researcher.
KW - BLAST
KW - Genome analysis
KW - Genome sequence walking
KW - String matching algorithm
UR - http://www.scopus.com/inward/record.url?scp=0037237022&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0037237022&partnerID=8YFLogxK
U2 - 10.1002/ecjb.10113
DO - 10.1002/ecjb.10113
M3 - Article
AN - SCOPUS:0037237022
SN - 8756-663X
VL - 86
SP - 64
EP - 72
JO - Electronics and Communications in Japan, Part II: Electronics (English translation of Denshi Tsushin Gakkai Ronbunshi)
JF - Electronics and Communications in Japan, Part II: Electronics (English translation of Denshi Tsushin Gakkai Ronbunshi)
IS - 1
ER -