The increasing number of security vulnerabilities has become an important problem that needs to be solved urgently in the field of software security, which means that the current vulnerability mining technology still has great potential for development. However, most of the existing AI-based vulnerability detection methods focus on designing different AI models to improve the accuracy of vulnerability detection, ignoring the fundamental problems of data-driven AI-based algorithms: first, there is a lack of sufficient high-quality vulnerability data; second, there is no unified standardized construction method to meet the standardized evaluation of different vulnerability detection models. This all greatly limits security personnel's in-depth research on vulnerabilities. In this survey, we review the current literature on building high-quality vulnerability datasets, aiming to investigate how state-of-the-art research has leveraged data mining and data processing techniques to generate vulnerability datasets to facilitate vulnerability discovery. We also identify the challenges of this new field and share our views on potential research directions.