Research group

Machine Learning Methods in Software Engineering

Code Clone Detection

Project supervisor: Timofey Bryksin
Status: Active

The project is dedicated to improving lexical methods of clone detection in code. The approach that is proposed in the project can be applied to any token-based tools: it consists in running the search with various parameters and merging the results together. The necessary parameters are estimated and the method is evaluated on two token-based clone detection tools — SourcererCC and CloneWorks.

Modified version of SourcererCC on GitHub.

The developed approach is also employed for a complex plagiarism study of GiuHub's Java code.