TY - JOUR
T1 - HMM-based attacks on Google’s ReCAPTCHA with continuous visual and audio symbols
AU - Sano, Shotaro
AU - Otsuka, Takuma
AU - Itoyama, Katsutoshi
AU - Okuno, Hiroshi G.
PY - 2015/11/15
Y1 - 2015/11/15
N2 - CAPTCHAs distinguish humans from automated programs by presenting questions that are easy for humans but difficult for computers, e.g., recognition of visual characters or audio utterances. The state of the art research suggests that the security of visual and audio CAPTCHAs mainly lies in anti-segmentation techniques, because individual symbol recognition after segmentation can be solved with a high success rate with certain machine learning algorithms. Thus, most recent commercial CAPTCHAs present continuous symbols to prevent automated segmentation. We propose a novel framework that can automatically decode continuous CAPTCHAs and assess its effectiveness with actual CAPTCHA questions from Google’s reCAPTCHA. Our framework is constructed on the basis of a sequence recognition method based on hidden Markov models (HMMs), which can be concisely implemented by using an offthe-shelf library HMM toolkit. This method concatenates several HMMs, each of which recognizes a symbol, to build a larger HMM that recognizes a question. Our experimental results reveal vulnerabilities in continuous CAPTCHAs because the solver cracks the visual and audio reCAPTCHA systems with 31.75% and 58.75% accuracy, respectively. We further propose guidelines to prevent possible attacking from HMM-based CAPTCHA solvers on the basis of synthetic experiments with simulated continuous CAPTCHAs.
AB - CAPTCHAs distinguish humans from automated programs by presenting questions that are easy for humans but difficult for computers, e.g., recognition of visual characters or audio utterances. The state of the art research suggests that the security of visual and audio CAPTCHAs mainly lies in anti-segmentation techniques, because individual symbol recognition after segmentation can be solved with a high success rate with certain machine learning algorithms. Thus, most recent commercial CAPTCHAs present continuous symbols to prevent automated segmentation. We propose a novel framework that can automatically decode continuous CAPTCHAs and assess its effectiveness with actual CAPTCHA questions from Google’s reCAPTCHA. Our framework is constructed on the basis of a sequence recognition method based on hidden Markov models (HMMs), which can be concisely implemented by using an offthe-shelf library HMM toolkit. This method concatenates several HMMs, each of which recognizes a symbol, to build a larger HMM that recognizes a question. Our experimental results reveal vulnerabilities in continuous CAPTCHAs because the solver cracks the visual and audio reCAPTCHA systems with 31.75% and 58.75% accuracy, respectively. We further propose guidelines to prevent possible attacking from HMM-based CAPTCHA solvers on the basis of synthetic experiments with simulated continuous CAPTCHAs.
KW - CAPTCHA
KW - Continuous character/speech recognition
KW - Hidden markov model
KW - Human interaction proof
UR - http://www.scopus.com/inward/record.url?scp=84947276087&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84947276087&partnerID=8YFLogxK
U2 - 10.2197/ipsjjip.23.814
DO - 10.2197/ipsjjip.23.814
M3 - Article
AN - SCOPUS:84947276087
SN - 0387-5806
VL - 23
SP - 814
EP - 826
JO - Journal of Information Processing
JF - Journal of Information Processing
IS - 6
ER -