TY - JOUR
T1 - Improvement of detection performance of fusion genes from RNA-seq data by clustering short reads
AU - Sota, Yoshiaki
AU - Seno, Shigeto
AU - Shigeta, Hironori
AU - Osato, Naoki
AU - Shimoda, Masafumi
AU - Noguchi, Shinzaburo
AU - Matsuda, Hideo
N1 - Funding Information:
This work was supported by JSPS KAKENHI Grant No. JP18H04124.
Publisher Copyright:
© 2019 The Author(s).
PY - 2019/6/1
Y1 - 2019/6/1
N2 - Fusion genes are involved in cancer, and their detection using RNA-Seq is insufficient given the relatively short reading length. Therefore, we proposed a shifted short-read clustering (SSC) method, which focuses on overlapping reads from the same loci and extends them as a representative sequence. To verify their usefulness, we applied the SSC method to RNA-Seq data from four types of cell lines (BT-474, MCF-7, SKBR-3, and T-47D). As the slide width of the SSC method increased to one, two, five, or ten bases, the read length was extended from 201 bases to 217 (108%), 234 (116%), 282 (140%), or 317 (158%) bases, respectively. Furthermore, fusion genes were investigated using STAR-Fusion, a fusion gene detection tool, with and without the SSC method. When one base was shifted by the SSC method, the reads mapped to multiple loci decreased from 9.7% to 4.6%, and the sensitivity of the fusion gene was improved from 47% to 54% on average (BT-474: from 48% to 57%, MCF-7: 49% to 53%, SKBR-3: 50% to 57%, and T-47D: 43% to 50%) compared with original data. When the reads are shifted more, the positive predictive value was also improved. The SSC method could be an effective method for fusion gene detection.
AB - Fusion genes are involved in cancer, and their detection using RNA-Seq is insufficient given the relatively short reading length. Therefore, we proposed a shifted short-read clustering (SSC) method, which focuses on overlapping reads from the same loci and extends them as a representative sequence. To verify their usefulness, we applied the SSC method to RNA-Seq data from four types of cell lines (BT-474, MCF-7, SKBR-3, and T-47D). As the slide width of the SSC method increased to one, two, five, or ten bases, the read length was extended from 201 bases to 217 (108%), 234 (116%), 282 (140%), or 317 (158%) bases, respectively. Furthermore, fusion genes were investigated using STAR-Fusion, a fusion gene detection tool, with and without the SSC method. When one base was shifted by the SSC method, the reads mapped to multiple loci decreased from 9.7% to 4.6%, and the sensitivity of the fusion gene was improved from 47% to 54% on average (BT-474: from 48% to 57%, MCF-7: 49% to 53%, SKBR-3: 50% to 57%, and T-47D: 43% to 50%) compared with original data. When the reads are shifted more, the positive predictive value was also improved. The SSC method could be an effective method for fusion gene detection.
KW - Cancer
KW - RNA-seq
KW - SlideSort
KW - fusion gene
UR - http://www.scopus.com/inward/record.url?scp=85068994729&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85068994729&partnerID=8YFLogxK
U2 - 10.1142/S0219720019400080
DO - 10.1142/S0219720019400080
M3 - Article
C2 - 31288642
AN - SCOPUS:85068994729
SN - 0219-7200
VL - 17
JO - Journal of Bioinformatics and Computational Biology
JF - Journal of Bioinformatics and Computational Biology
IS - 3
M1 - 1940008
ER -