Machine Learning Methods in Software Engineering

Multi-Threshold Code Clone Detection

The project is dedicated to improving lexical methods of clone detection in code. The approach that is proposed in the project can be applied to any token-based tools: it consists of running the search with various parameters and merging the results together. The necessary parameters are estimated and the method is evaluated on two token-based clone detection tools — SourcererCC and CloneWorks.

Modified version of SourcererCC on GitHub.



Multi-Threshold Token-Based Code Clone Detection

March 2021

Yaroslav Golubev, Viktor Poletansky, Nikita Povarov, and Timofey Bryksin

