Research group

Machine Learning Methods in Software Engineering

GitHub License Violations Study

Timofey BryksinInactive

In this project, complex plagiarism analysis of code fragments is conducted for Java code from GitHub. The project consists of three parts: gathering of a large (1.5 Tb) corpus of Java repositories, searching it for clones (using the approach proposed in our other project), and the analysis itself, studying plagiarism and license violations in the obtained data. Discovered licenses and relationships between them are studied in great detail, and similar fragments of code are ranged by the possibility of them constituting a license violation.

The project's repository on GitHub.



On the Nature of Code Cloning in Open-Source Java Projects

October 2021

Yaroslav Golubev and Timofey Bryksin

Read more

A Study of Potential Code Borrowing and License Violations in Java Projects on GitHub

June 2020

Yaroslav Golubev, Maria Eliseeva, Nikita Povarov and Timofey Bryksin

Read more