TY - GEN
T1 - A loop structure optimization targeting high-level synthesis of fast number theoretic transform
AU - Kawamura, Kazushi
AU - Yanagisawa, Masao
AU - Togawa, Nozomu
N1 - Funding Information:
ACKNOWLEDGMENT This work was supported by JST CREST Grant Number JPMJCR1503, Japan.
Publisher Copyright:
© 2018 IEEE.
PY - 2018/5/9
Y1 - 2018/5/9
N2 - Multiplication with a large number of digits is heavily used when processing data encrypted by a fully homomorphic encryption, which is a bottleneck in computation time. An algorithm utilizing fast number theoretic transform (FNTT) is known as a high-speed multiplication algorithm and the further speeding up is expected by implementing the FNTT process on an FPGA. A high-level synthesis tool enables efficient hardware implementation even for FNTT with a large number of points. In this paper, we propose a methodology for optimizing the loop structure included in a software description of FNTT so that the performance of the synthesized FNTT processor can be maximized. The loop structure optimization is considered in terms of loop flattening and trip count reduction. We implement a 65,536-point FNTT processor with the loop structure optimization on an FPGA, and demonstrate that it can be executed 6.9 times faster than the execution on a CPU.
AB - Multiplication with a large number of digits is heavily used when processing data encrypted by a fully homomorphic encryption, which is a bottleneck in computation time. An algorithm utilizing fast number theoretic transform (FNTT) is known as a high-speed multiplication algorithm and the further speeding up is expected by implementing the FNTT process on an FPGA. A high-level synthesis tool enables efficient hardware implementation even for FNTT with a large number of points. In this paper, we propose a methodology for optimizing the loop structure included in a software description of FNTT so that the performance of the synthesized FNTT processor can be maximized. The loop structure optimization is considered in terms of loop flattening and trip count reduction. We implement a 65,536-point FNTT processor with the loop structure optimization on an FPGA, and demonstrate that it can be executed 6.9 times faster than the execution on a CPU.
KW - FPGA
KW - fully homomorphic encryption (FHE)
KW - high-level synthesis (HLS)
KW - loop optimization
KW - number theoretic transform (NTT)
UR - http://www.scopus.com/inward/record.url?scp=85047948083&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85047948083&partnerID=8YFLogxK
U2 - 10.1109/ISQED.2018.8357273
DO - 10.1109/ISQED.2018.8357273
M3 - Conference contribution
AN - SCOPUS:85047948083
T3 - Proceedings - International Symposium on Quality Electronic Design, ISQED
SP - 106
EP - 111
BT - 2018 19th International Symposium on Quality Electronic Design, ISQED 2018
PB - IEEE Computer Society
T2 - 19th International Symposium on Quality Electronic Design, ISQED 2018
Y2 - 13 March 2018 through 14 March 2018
ER -