TY - JOUR
T1 - Generative Models for Extrapolation Prediction in Materials Informatics
AU - Hatakeyama-Sato, Kan
AU - Oyaizu, Kenichi
N1 - Funding Information:
This work was partially supported by Grants-in-Aid for Scientific Research (Nos. 17H03072, 18K19120, 18H05515, 20H05298, 21H02017, and 19K15638) from MEXT, Japan. The work was also partially supported by the Research Institute for Science and Engineering in Waseda University, and a research grant from the Center for Data Science in Waseda University, and Information Services International-Dentsu, Ltd.
Funding Information:
This work was partially supported by Grants-in-Aid for Scientific Research (Nos. 17H03072, 18K19120, 18H05515, 20H05298 21H02017, and 19K15638) from MEXT, Japan. The work was also partially supported by the Research Institute for Science and Engineering in Waseda University, and a research grant from the Center for Data Science in Waseda University, and Information Services International-Dentsu Ltd.
Publisher Copyright:
©
PY - 2021/6/8
Y1 - 2021/6/8
N2 - We report a deep generative model for regression tasks in materials informatics. The model is introduced as a component of a data imputer and predicts more than 20 diverse experimental properties of organic molecules. The imputer is designed to predict material properties by "imagining"the missing data in the database, enabling the use of incomplete material data. Even removing 60% of the data does not diminish the prediction accuracy in a model task. Moreover, the model excels at extrapolation prediction, where target values of the test data are out of the range of the training data. Such an extrapolation has been regarded as an essential technique for exploring novel materials but has hardly been studied to date due to its difficulty. We demonstrate that the prediction performance can be improved by >30% by using the imputer compared with traditional linear regression and boosting models. The benefit becomes especially pronounced with few records for an experimental property (<100 cases) when prediction would be difficult by conventional methods. The presented approach can be used to more efficiently explore functional materials and break through previous performance limits.
AB - We report a deep generative model for regression tasks in materials informatics. The model is introduced as a component of a data imputer and predicts more than 20 diverse experimental properties of organic molecules. The imputer is designed to predict material properties by "imagining"the missing data in the database, enabling the use of incomplete material data. Even removing 60% of the data does not diminish the prediction accuracy in a model task. Moreover, the model excels at extrapolation prediction, where target values of the test data are out of the range of the training data. Such an extrapolation has been regarded as an essential technique for exploring novel materials but has hardly been studied to date due to its difficulty. We demonstrate that the prediction performance can be improved by >30% by using the imputer compared with traditional linear regression and boosting models. The benefit becomes especially pronounced with few records for an experimental property (<100 cases) when prediction would be difficult by conventional methods. The presented approach can be used to more efficiently explore functional materials and break through previous performance limits.
UR - http://www.scopus.com/inward/record.url?scp=85108642875&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85108642875&partnerID=8YFLogxK
U2 - 10.1021/acsomega.1c01716
DO - 10.1021/acsomega.1c01716
M3 - Article
AN - SCOPUS:85108642875
SN - 2470-1343
VL - 6
SP - 14566
EP - 14574
JO - ACS Omega
JF - ACS Omega
IS - 22
ER -