Machine Learning Methods in Software Engineering

Gathering the dataset of semantic clones

Project supervisor: Timofey Bryksin
Status: Active

This project is a collabloration with Hannes Thaller from Johannes Kepler University. The goal of this project is to collect a dataset of semantic code clones, that is fragments of code that implement the same functionality in different ways. This task came up for Hannes from the need to evaluate a method for detecting such clones that he developed, using probabilistic software modelling. The dataset uses problems from Google Code Jam and AtCoder.

The repository of project on GitHub.