TY - JOUR
T1 - Efficient privacy-preserving string search and an application in genomics
AU - Shimizu, Kana
AU - Nuida, Koji
AU - Rätsch, Gunnar
N1 - Funding Information:
Funding: We gratefully acknowledge funding from AIST (to K.S.), Memorial Sloan Kettering Cancer Center (to G.R.) and NIH (grant 1R01CA176785-01A1). This study was also supported by the Japan-Finland Cooperative Scientific Research Program of Japan Science and Technology Agency (JST; to K.S.).
Publisher Copyright:
© 2016 The Author 2016. Published by Oxford University Press.
PY - 2016/6/1
Y1 - 2016/6/1
N2 - Motivation: Personal genomes carry inherent privacy risks and protecting privacy poses major social and technological challenges. We consider the case where a user searches for genetic information (e.g. an allele) on a server that stores a large genomic database and aims to receive allele-associated information. The user would like to keep the query and result private and the server the database. Approach: We propose a novel approach that combines efficient string data structures such as the Burrows-Wheeler transform with cryptographic techniques based on additive homomorphic encryption. We assume that the sequence data is searchable in efficient iterative query operations over a large indexed dictionary, for instance, from large genome collections and employing the (positional) Burrows-Wheeler transform. We use a technique called oblivious transfer that is based on additive homomorphic encryption to conceal the sequence query and the genomic region of interest in positional queries. Results: We designed and implemented an efficient algorithm for searching sequences of SNPs in large genome databases. During search, the user can only identify the longest match while the server does not learn which sequence of SNPs the user queried. In an experiment based on 2184 aligned haploid genomes from the 1000 Genomes Project, our algorithm was able to perform typical queries within ≈ 4.6 s and ≈ 10.8 s for client and server side, respectively, on laptop computers. The presented algorithm is at least one order of magnitude faster than an exhaustive baseline algorithm. Availability and implementation: https://github.com/iskana/PBWT-sec and https://github.com/ratschlab/PBWT-sec. Supplementary information: Supplementary data are available at Bioinformatics online.
AB - Motivation: Personal genomes carry inherent privacy risks and protecting privacy poses major social and technological challenges. We consider the case where a user searches for genetic information (e.g. an allele) on a server that stores a large genomic database and aims to receive allele-associated information. The user would like to keep the query and result private and the server the database. Approach: We propose a novel approach that combines efficient string data structures such as the Burrows-Wheeler transform with cryptographic techniques based on additive homomorphic encryption. We assume that the sequence data is searchable in efficient iterative query operations over a large indexed dictionary, for instance, from large genome collections and employing the (positional) Burrows-Wheeler transform. We use a technique called oblivious transfer that is based on additive homomorphic encryption to conceal the sequence query and the genomic region of interest in positional queries. Results: We designed and implemented an efficient algorithm for searching sequences of SNPs in large genome databases. During search, the user can only identify the longest match while the server does not learn which sequence of SNPs the user queried. In an experiment based on 2184 aligned haploid genomes from the 1000 Genomes Project, our algorithm was able to perform typical queries within ≈ 4.6 s and ≈ 10.8 s for client and server side, respectively, on laptop computers. The presented algorithm is at least one order of magnitude faster than an exhaustive baseline algorithm. Availability and implementation: https://github.com/iskana/PBWT-sec and https://github.com/ratschlab/PBWT-sec. Supplementary information: Supplementary data are available at Bioinformatics online.
UR - http://www.scopus.com/inward/record.url?scp=84973376088&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84973376088&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/btw050
DO - 10.1093/bioinformatics/btw050
M3 - Article
C2 - 27153731
AN - SCOPUS:84973376088
SN - 1367-4803
VL - 32
SP - 1652
EP - 1661
JO - Bioinformatics
JF - Bioinformatics
IS - 11
ER -