Sentence embedding and fine-tuning to automatically identify duplicate bugs

Haruna Isotani, Hironori Washizaki*, Yoshiaki Fukazawa, Tsutomu Nomoto, Saori Ouji, Shinobu Saito

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review


Industrial software maintenance is critical but burdensome. Activities such as detecting duplicate bug reports are often performed manually. Herein an automated duplicate bug report detection system improves maintenance efficiency using vectorization of the contents and deep learning–based sentence embedding to calculate the similarity of the whole report from vectors of individual elements. Specifically, sentence embedding is realized using Sentence-BERT fine tuning. Additionally, its performance is experimentally compared to baseline methods to validate the proposed system. The proposed system detects duplicate bug reports more effectively than existing methods.

Original languageEnglish
Article number1032452
JournalFrontiers in Computer Science
Publication statusPublished - 2023 Jan 19


  • BERT
  • bug reports
  • duplicate detection
  • information retrieval
  • natural language processing
  • sentence embedding

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Science Applications


Dive into the research topics of 'Sentence embedding and fine-tuning to automatically identify duplicate bugs'. Together they form a unique fingerprint.

Cite this