Enhanching Token-Based Clone Detection and Using It to Detect Possible License Violations on GitHub

Due to the constantly increasing amount of open source software and the prevalence of services like GitHub and StackOverflow, the problem of possible and omnipresent reuse and borrowing of code becomes more and more topical. The situation is exacerbated by the complex retationships between various open source licenses, some compatible, some incompatible, which a lot of developers do not fully understand.

On this seminar, we will briefly list the existing appraches and limitations of code clone detection, as well specific features of licensing of code, and will then discuss the results of a study conducted in our laboratory. In the first part of the seminar, the modification to token-based clone detection will be presented that allows developers to detect more clones of a more diverse nature, and in the second part of the seminar, the application of such clone detection for discovering possible code borrowings and license violations on a scale of popular Java code on GitHub will be demonstrated and discussed.

Speaker: Yaroslav Golubev.

Language: Russian.

Date and Time: Feburary 5th, 7:30 pm - 9 pm.

Place: Times, room 404.

