Research group

Machine Learning Methods in Software Engineering

Coding Assistant

Timofey BryksinActive

The goal of the project is to examine students' behavior while they are solving diverse programming tasks and to create an assistance system based on previous solutions. The idea is to collapse all partial solutions of each problem into a single graph, a solution space, and find the best path to the correct solution in this graph to generate hits about the next steps for new students.

To achieve this goal, we developed a set of tools for collecting and processing students' activity during problem-solving. The first tool is a plugin for IntelliJ-based IDEs that captures snapshots of code and IDE interaction events during the code writing, thus allowing us to analyze the programming process; the plugin currently supports Python, Java, Kotlin, and C++. The second  tool is designed for post-processing of the data collected by the plugin, its analysis and visualization.

To validate and showcase the toolkit, we have already gathered a small dataset. It describes in detail the process of solving programming tasks by 148 participants — all of different age, programming skills, and using different languages. To publish the dataset we need to anonymize it according to our privacy policy. We developed a special tool for this task.

We are currently working on a PyCharm plugin that unifies Python code by applying various transformations to PSI, such as anonymizing variables, removing dead code, etc. It will help us determine as accurately as possible whether syntactically different fragments of code actually have the same semantics. We need this tool for an algorithm that generates hints (previously we implemented a prototype in Python, it can be found here), but it could be used elsewhere as well (for instance, for semantic clone detection).