TY - GEN
T1 - Vulnerability Dataset Construction Methods Applied To Vulnerability Detection
T2 - 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshop, DSN-W 2022
AU - Lin, Yuhao
AU - Li, Ying
AU - Gu, Mianxue
AU - Sun, Hongyu
AU - Yue, Qiuling
AU - Hu, Jinglu
AU - Cao, Chunjie
AU - Zhang, Yuqing
N1 - Funding Information:
ACKNOWLEDGMENT This work was supported by the Key Research and Development Science and Technology of Hainan Province (ZDYF202012, GHYF2022010), and the National Natural Science Foundation of China (U1836210).
Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - The increasing number of security vulnerabilities has become an important problem that needs to be solved urgently in the field of software security, which means that the current vulnerability mining technology still has great potential for development. However, most of the existing AI-based vulnerability detection methods focus on designing different AI models to improve the accuracy of vulnerability detection, ignoring the fundamental problems of data-driven AI-based algorithms: first, there is a lack of sufficient high-quality vulnerability data; second, there is no unified standardized construction method to meet the standardized evaluation of different vulnerability detection models. This all greatly limits security personnel's in-depth research on vulnerabilities. In this survey, we review the current literature on building high-quality vulnerability datasets, aiming to investigate how state-of-the-art research has leveraged data mining and data processing techniques to generate vulnerability datasets to facilitate vulnerability discovery. We also identify the challenges of this new field and share our views on potential research directions.
AB - The increasing number of security vulnerabilities has become an important problem that needs to be solved urgently in the field of software security, which means that the current vulnerability mining technology still has great potential for development. However, most of the existing AI-based vulnerability detection methods focus on designing different AI models to improve the accuracy of vulnerability detection, ignoring the fundamental problems of data-driven AI-based algorithms: first, there is a lack of sufficient high-quality vulnerability data; second, there is no unified standardized construction method to meet the standardized evaluation of different vulnerability detection models. This all greatly limits security personnel's in-depth research on vulnerabilities. In this survey, we review the current literature on building high-quality vulnerability datasets, aiming to investigate how state-of-the-art research has leveraged data mining and data processing techniques to generate vulnerability datasets to facilitate vulnerability discovery. We also identify the challenges of this new field and share our views on potential research directions.
KW - datasets
KW - deep learning
KW - security vulnerabilities
UR - http://www.scopus.com/inward/record.url?scp=85136099824&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85136099824&partnerID=8YFLogxK
U2 - 10.1109/DSN-W54100.2022.00032
DO - 10.1109/DSN-W54100.2022.00032
M3 - Conference contribution
AN - SCOPUS:85136099824
T3 - Proceedings - 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshop Volume, DSN-W 2022
SP - 141
EP - 146
BT - Proceedings - 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshop Volume, DSN-W 2022
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 27 June 2022 through 30 June 2022
ER -