TY - JOUR
T1 - Optimized implementation for calculation and fast-update of Pfaffians installed to the open-source fermionic variational solver mVMC
AU - Xu, Ru Qing G.
AU - Okubo, Tsuyoshi
AU - Todo, Synge
AU - Imada, Masatoshi
N1 - Funding Information:
A64FX assembly kernels for BLIS used in this work are developed in a joint work with Stepan Nassyr from Forschungszentrum J?lich and the Science of High-Performance Computing (SHPC) group from the University of Texas at Austin. A64FX test results were collected under ?Program for Promoting Researches on the Supercomputer Fugaku? by MEXT with the project title ?Basic Science for Emergence and Functionality in Quantum Matter: Innovative Strongly-Correlated Electron Science by Integration of ?Fugaku? and Frontier Experiments? (hp200132). Our development work used also the Isambard 2 UK National Tier-2 HPC Service (http://gw4.ac.uk/isambard/) operated by GW4 and the UK Met Office, and funded by EPSRC (EP/T022078/1). RGX acknowledge the Global Science Graduate Course (GSGC) program of the University of Tokyo. TO and ST acknowledge support by the Endowed Project for Quantum Software Research and Education, the University of Tokyo (https://qsw.phys.s.u-tokyo.ac.jp/).
Funding Information:
A64FX assembly kernels for BLIS used in this work are developed in a joint work with Stepan Nassyr from Forschungszentrum Jülich and the Science of High-Performance Computing (SHPC) group from the University of Texas at Austin. A64FX test results were collected under “Program for Promoting Researches on the Supercomputer Fugaku” by MEXT with the project title “Basic Science for Emergence and Functionality in Quantum Matter: Innovative Strongly-Correlated Electron Science by Integration of “Fugaku” and Frontier Experiments” (hp200132). Our development work used also the Isambard 2 UK National Tier-2 HPC Service ( http://gw4.ac.uk/isambard/ ) operated by GW4 and the UK Met Office, and funded by EPSRC ( EP/T022078/1 ). RGX acknowledge the Global Science Graduate Course (GSGC) program of the University of Tokyo. TO and ST acknowledge support by the Endowed Project for Quantum Software Research and Education, the University of Tokyo ( https://qsw.phys.s.u-tokyo.ac.jp/ ).
Publisher Copyright:
© 2022 Elsevier B.V.
PY - 2022/8
Y1 - 2022/8
N2 - In this article, we present a high performance, portable and well templated implementation for computing and fast-updating Pfaffian and inverse of an even-ranked skew-symmetric (antisymmetric) matrix. It is achieved with a skew-symmetric, blocked variant of the Parlett-Reid algorithm and a blocked update scheme based on the Woodbury matrix identity. Installation of this framework into the geminal-wavefunction-based many-variable Variational Monte Carlo (mVMC) code boosts sampling performance to up to more than 6 times without changing Markov chain's behavior. The implementation is based on an extension of the BLAS-like instantiation software (BLIS) framework which has optimized kernel for many state-of-the-art processors including Intel Skylake-X, AMD EPYC Rome and Fujitsu A64FX. Program summary: Program title: Pfaffine and PfUpdates library for mVMC [1] CPC Library link to program files: https://doi.org/10.17632/rz9rs8cpws.1 Developer's repository link: https://github.com/issp-center-dev/mVMC/tree/master/src/pfupdates, https://github.com/xrq-phys/Pfaffine Licensing provisions: MPL-2.0 (for new Library part) Programming language: C++14 (for new Library part) Nature of problem: Finding a method for computing and updating Pfaffian and inverse of a skew-symmetric matrix that yields a high performance on modern processor architectures. Solution method: Deploying a blocked version of the Parlett-Reid algorithm with BLIS serving as assembly-level backend. Updating is approached using a modified Woodbury matrix identity. References: [1] T. Misawa, S. Morita, K. Yoshimi, M. Kawamura, Y. Motoyama, K. Ido, T. Ohgoe, M. Imada and T. Kato, Comput. Phys. Commun. 235 (2019) 447–462.
AB - In this article, we present a high performance, portable and well templated implementation for computing and fast-updating Pfaffian and inverse of an even-ranked skew-symmetric (antisymmetric) matrix. It is achieved with a skew-symmetric, blocked variant of the Parlett-Reid algorithm and a blocked update scheme based on the Woodbury matrix identity. Installation of this framework into the geminal-wavefunction-based many-variable Variational Monte Carlo (mVMC) code boosts sampling performance to up to more than 6 times without changing Markov chain's behavior. The implementation is based on an extension of the BLAS-like instantiation software (BLIS) framework which has optimized kernel for many state-of-the-art processors including Intel Skylake-X, AMD EPYC Rome and Fujitsu A64FX. Program summary: Program title: Pfaffine and PfUpdates library for mVMC [1] CPC Library link to program files: https://doi.org/10.17632/rz9rs8cpws.1 Developer's repository link: https://github.com/issp-center-dev/mVMC/tree/master/src/pfupdates, https://github.com/xrq-phys/Pfaffine Licensing provisions: MPL-2.0 (for new Library part) Programming language: C++14 (for new Library part) Nature of problem: Finding a method for computing and updating Pfaffian and inverse of a skew-symmetric matrix that yields a high performance on modern processor architectures. Solution method: Deploying a blocked version of the Parlett-Reid algorithm with BLIS serving as assembly-level backend. Updating is approached using a modified Woodbury matrix identity. References: [1] T. Misawa, S. Morita, K. Yoshimi, M. Kawamura, Y. Motoyama, K. Ido, T. Ohgoe, M. Imada and T. Kato, Comput. Phys. Commun. 235 (2019) 447–462.
KW - BLAS
KW - Blocked algorithm
KW - Ground-state method
KW - LAPACK
KW - Pfaffian
KW - Quantum lattice model
KW - Skew-symmetric matrix
KW - Variational Monte Carlo
UR - http://www.scopus.com/inward/record.url?scp=85129102331&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85129102331&partnerID=8YFLogxK
U2 - 10.1016/j.cpc.2022.108375
DO - 10.1016/j.cpc.2022.108375
M3 - Article
AN - SCOPUS:85129102331
SN - 0010-4655
VL - 277
JO - Computer Physics Communications
JF - Computer Physics Communications
M1 - 108375
ER -