TY - GEN
T1 - Detection of fusion genes from human breast cancer cell-line RNA-seq data using shifted short read clustering
AU - Sota, Yoshiaki
AU - Seno, Shigeto
AU - Shigeta, Hironori
AU - Osato, Naoki
AU - Shimoda, Masafumi
AU - Noguchi, Shinzaburo
AU - Matsuda, Hideo
N1 - Funding Information:
ACKNOWLEDGMENT This work was supported by JSPS KAKENHI Grant Number JP18H04124.
Publisher Copyright:
© 2018 IEEE.
PY - 2018/12/6
Y1 - 2018/12/6
N2 - Fusion genes make for one of the mechanisms of tumorigenesis. The identification of fusion genes by RNA-Seq has attracted attention. Various methods for detecting fusion genes have been proposed, but their accuracy is not sufficient. One of the causes of this problem is the relatively short reading length in RNA-Seq data. Therefore, before mapping RNA-Seq data, we proposed a method, which is based on shifted short-read clustering (SSC), to identify shifted reads of the same origin and extend them as representative sequences. As a result, we assumed that the percentage of uniquely mapped reads would be increased, and the detection rates of the fusion genes could be improved. To verify these hypotheses, we applied the SSC method to RNA-Seq data from three celllines (BT-474, MCF-7, and SKBR-3). When only one base was shifted, the average read lengths of BT-474, MCF-7, and SKBR-3 were extended from 201 to 223 bases (111%), 201 to 214 bases (106%), and 201 to 213 bases (106%), respectively. Furthermore, the effectiveness of the SSC method is demonstrated by comparing the performances of a fusion gene detection tool's results, STAR-Fusion, with and without the SSC method of the reads. The percentage of uniquely mapped reads of BT-474, MCF-7, and SKBR-3 were improved from 88% to 93%, 88% to 94%, and 92% to 95%, respectively. Finally, the fusion gene detection rates of BT-474, MCF-7, and SKBR-3 were increased from 48% to 57%, 49% to 53%, and 50% to 53% respectively. The SSC method is considered to be an effective method not only for improving the percentage of uniquely mapped reads but also for fusion gene detection.
AB - Fusion genes make for one of the mechanisms of tumorigenesis. The identification of fusion genes by RNA-Seq has attracted attention. Various methods for detecting fusion genes have been proposed, but their accuracy is not sufficient. One of the causes of this problem is the relatively short reading length in RNA-Seq data. Therefore, before mapping RNA-Seq data, we proposed a method, which is based on shifted short-read clustering (SSC), to identify shifted reads of the same origin and extend them as representative sequences. As a result, we assumed that the percentage of uniquely mapped reads would be increased, and the detection rates of the fusion genes could be improved. To verify these hypotheses, we applied the SSC method to RNA-Seq data from three celllines (BT-474, MCF-7, and SKBR-3). When only one base was shifted, the average read lengths of BT-474, MCF-7, and SKBR-3 were extended from 201 to 223 bases (111%), 201 to 214 bases (106%), and 201 to 213 bases (106%), respectively. Furthermore, the effectiveness of the SSC method is demonstrated by comparing the performances of a fusion gene detection tool's results, STAR-Fusion, with and without the SSC method of the reads. The percentage of uniquely mapped reads of BT-474, MCF-7, and SKBR-3 were improved from 88% to 93%, 88% to 94%, and 92% to 95%, respectively. Finally, the fusion gene detection rates of BT-474, MCF-7, and SKBR-3 were increased from 48% to 57%, 49% to 53%, and 50% to 53% respectively. The SSC method is considered to be an effective method not only for improving the percentage of uniquely mapped reads but also for fusion gene detection.
KW - Cancer
KW - Fusion gene
KW - RNA-seq
KW - SlideSort
UR - http://www.scopus.com/inward/record.url?scp=85060394843&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85060394843&partnerID=8YFLogxK
U2 - 10.1109/BIBE.2018.00038
DO - 10.1109/BIBE.2018.00038
M3 - Conference contribution
AN - SCOPUS:85060394843
T3 - Proceedings - 2018 IEEE 18th International Conference on Bioinformatics and Bioengineering, BIBE 2018
SP - 159
EP - 162
BT - Proceedings - 2018 IEEE 18th International Conference on Bioinformatics and Bioengineering, BIBE 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 18th IEEE International Conference on Bioinformatics and Bioengineering, BIBE 2018
Y2 - 29 October 2018 through 31 October 2018
ER -