TY - JOUR
T1 - Efficient privacy-preserving variable-length substring match for genome sequence
AU - Nakagawa, Yoshiki
AU - Ohata, Satsuya
AU - Shimizu, Kana
N1 - Funding Information:
This work is partially supported by JST CREST Grant Number JPMJCR19F6, MEXT/JSPS KAKENHI grant number 19K12209 and 21H04871/21H05052. KS thanks Prof. Kunihiko Sadakane and Mr. Tomoki Uchiyama for giving the important comments for improving the paper.
Publisher Copyright:
© 2022, The Author(s).
PY - 2022/12
Y1 - 2022/12
N2 - The development of a privacy-preserving technology is important for accelerating genome data sharing. This study proposes an algorithm that securely searches a variable-length substring match between a query and a database sequence. Our concept hinges on a technique that efficiently applies FM-index for a secret-sharing scheme. More precisely, we developed an algorithm that can achieve a secure table lookup in such a way that V[V[… V[p] …]] is computed for a given depth of recursion where p is an initial position, and V is a vector. We used the secure table lookup for vectors created based on FM-index. The notable feature of the secure table lookup is that time, communication, and round complexities are not dependent on the table length N, after the query input. Therefore, a substring match by reference to the FM-index-based table can also be conducted independently against the database length, and the entire search time is dramatically improved compared to previous approaches. We conducted an experiment using a human genome sequence with the length of 10 million as the database and a query with the length of 100 and found that the query response time of our protocol was at least three orders of magnitude faster than a non-indexed database search protocol under the realistic computation/network environment.
AB - The development of a privacy-preserving technology is important for accelerating genome data sharing. This study proposes an algorithm that securely searches a variable-length substring match between a query and a database sequence. Our concept hinges on a technique that efficiently applies FM-index for a secret-sharing scheme. More precisely, we developed an algorithm that can achieve a secure table lookup in such a way that V[V[… V[p] …]] is computed for a given depth of recursion where p is an initial position, and V is a vector. We used the secure table lookup for vectors created based on FM-index. The notable feature of the secure table lookup is that time, communication, and round complexities are not dependent on the table length N, after the query input. Therefore, a substring match by reference to the FM-index-based table can also be conducted independently against the database length, and the entire search time is dramatically improved compared to previous approaches. We conducted an experiment using a human genome sequence with the length of 10 million as the database and a query with the length of 100 and found that the query response time of our protocol was at least three orders of magnitude faster than a non-indexed database search protocol under the realistic computation/network environment.
KW - FM-index
KW - LCP array
KW - Maximal exact match
KW - Private genome sequence search
KW - Secret sharing
KW - Secure multiparty computation
KW - Suffix array
UR - http://www.scopus.com/inward/record.url?scp=85128983641&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85128983641&partnerID=8YFLogxK
U2 - 10.1186/s13015-022-00211-1
DO - 10.1186/s13015-022-00211-1
M3 - Article
AN - SCOPUS:85128983641
SN - 1748-7188
VL - 17
JO - Algorithms for Molecular Biology
JF - Algorithms for Molecular Biology
IS - 1
M1 - 9
ER -