Efficient privacy-preserving variable-length substring match for genome sequence

Yoshiki Nakagawa, Satsuya Ohata, Kana Shimizu*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

The development of a privacy-preserving technology is important for accelerating genome data sharing. This study proposes an algorithm that securely searches a variable-length substring match between a query and a database sequence. Our concept hinges on a technique that efficiently applies FM-index for a secret-sharing scheme. More precisely, we developed an algorithm that can achieve a secure table lookup in such a way that V[V[… V[p] …]] is computed for a given depth of recursion where p is an initial position, and V is a vector. We used the secure table lookup for vectors created based on FM-index. The notable feature of the secure table lookup is that time, communication, and round complexities are not dependent on the table length N, after the query input. Therefore, a substring match by reference to the FM-index-based table can also be conducted independently against the database length, and the entire search time is dramatically improved compared to previous approaches. We conducted an experiment using a human genome sequence with the length of 10 million as the database and a query with the length of 100 and found that the query response time of our protocol was at least three orders of magnitude faster than a non-indexed database search protocol under the realistic computation/network environment.

Original languageEnglish
Article number9
JournalAlgorithms for Molecular Biology
Volume17
Issue number1
DOIs
Publication statusPublished - 2022 Dec

Keywords

  • FM-index
  • LCP array
  • Maximal exact match
  • Private genome sequence search
  • Secret sharing
  • Secure multiparty computation
  • Suffix array

ASJC Scopus subject areas

  • Structural Biology
  • Molecular Biology
  • Computational Theory and Mathematics
  • Applied Mathematics

Fingerprint

Dive into the research topics of 'Efficient privacy-preserving variable-length substring match for genome sequence'. Together they form a unique fingerprint.

Cite this