Impact of human gene annotations on RNA-seq differential expression analysis

Yu Hamaguchi*, Chao Zeng, Michiaki Hamada*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

4 Citations (Scopus)

Abstract

Background: Differential expression (DE) analysis of RNA-seq data typically depends on gene annotations. Different sets of gene annotations are available for the human genome and are continually updated–a process complicated with the development and application of high-throughput sequencing technologies. However, the impact of the complexity of gene annotations on DE analysis remains unclear. Results: Using “mappability”, a metric of the complexity of gene annotation, we compared three distinct human gene annotations, GENCODE, RefSeq, and NONCODE, and evaluated how mappability affected DE analysis. We found that mappability was significantly different among the human gene annotations. We also found that increasing mappability improved the performance of DE analysis, and the impact of mappability mainly evident in the quantification step and propagated downstream of DE analysis systematically. Conclusions: We assessed how the complexity of gene annotations affects DE analysis using mappability. Our findings indicate that the growth and complexity of gene annotations negatively impact the performance of DE analysis, suggesting that an approach that excludes unnecessary gene models from gene annotations improves the performance of DE analysis.

Original languageEnglish
Article number730
JournalBMC Genomics
Volume22
Issue number1
DOIs
Publication statusPublished - 2021 Dec

Keywords

  • Benchmarking
  • Differential expression analysis
  • Gene annotation
  • RNA-seq

ASJC Scopus subject areas

  • Biotechnology
  • Genetics

Fingerprint

Dive into the research topics of 'Impact of human gene annotations on RNA-seq differential expression analysis'. Together they form a unique fingerprint.

Cite this