TY - GEN
T1 - Improving Syntactical Clone Detection Methods through the Use of an Intermediate Representation
AU - Caldeira, Pedro M.
AU - Sakamoto, Kazunori
AU - Washizaki, Hironori
AU - Fukazawa, Yoshiaki
AU - Shimada, Takahisa
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/2
Y1 - 2020/2
N2 - Detection of type-3 and type-4 clones remains a difficult task. Current methods are complex, both on a conceptual and computational level. Similarly, their usage requires substantial implementation efforts. Instead of creating yet another method, it might be more productive to combine the simplicity of syntactic approaches with the abstractions granted by intermediate representations (IR). To this end, we devised a c-like IR based on LLVM and ran NiCad on it (LLNiCad). To establish whether the clone detection capabilities of syntactic approaches can be improved through an IR, we compared NiCad and LLNiCad on three open source projects taken from Krutz's benchmark and a subset of Google code jam solutions. In our results, the f1-score of LLNiCad consistently outperforms NiCad. Indeed, for all clone types in Krutz's benchmark, LLNiCad has a f1-score that is 37% higher than NiCad; with both better precision and recall. For type-4 clones in our GCJ benchmark, the f1-score of LLNiCad also outperforms CCCD (a semantic clone detector) by 44%. These findings suggest that IRs are beneficial for improving clone detection and that they have a larger impact on type-3 and type-4 clones.
AB - Detection of type-3 and type-4 clones remains a difficult task. Current methods are complex, both on a conceptual and computational level. Similarly, their usage requires substantial implementation efforts. Instead of creating yet another method, it might be more productive to combine the simplicity of syntactic approaches with the abstractions granted by intermediate representations (IR). To this end, we devised a c-like IR based on LLVM and ran NiCad on it (LLNiCad). To establish whether the clone detection capabilities of syntactic approaches can be improved through an IR, we compared NiCad and LLNiCad on three open source projects taken from Krutz's benchmark and a subset of Google code jam solutions. In our results, the f1-score of LLNiCad consistently outperforms NiCad. Indeed, for all clone types in Krutz's benchmark, LLNiCad has a f1-score that is 37% higher than NiCad; with both better precision and recall. For type-4 clones in our GCJ benchmark, the f1-score of LLNiCad also outperforms CCCD (a semantic clone detector) by 44%. These findings suggest that IRs are beneficial for improving clone detection and that they have a larger impact on type-3 and type-4 clones.
UR - http://www.scopus.com/inward/record.url?scp=85084338285&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85084338285&partnerID=8YFLogxK
U2 - 10.1109/IWSC50091.2020.9047637
DO - 10.1109/IWSC50091.2020.9047637
M3 - Conference contribution
AN - SCOPUS:85084338285
T3 - IWSC 2020 - Proceedings of the 2020 IEEE 14th International Workshop on Software Clones
SP - 8
EP - 14
BT - IWSC 2020 - Proceedings of the 2020 IEEE 14th International Workshop on Software Clones
A2 - Sajnani, Hitesh
A2 - Ragkhitwetsagul, Chaiyong
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 14th IEEE International Workshop on Software Clones, IWSC 2020
Y2 - 18 February 2020
ER -