Gathering the dataset of semantic clones

This project is a collabloration with Hannes Thaller from Johannes Kepler University. The goal of this project is to collect a dataset of semantic code clones, that is, fragments of code that implement the same functionality in different ways. For Hannes, this task came from the need to evaluate the method that he developed for detecting such clones using probabilistic software modelling. The dataset uses problems from Google Code Jam and AtCoder.

