Research group

Information Systems Engineering Lab

There are two main research directions:

  • Design of Query Processing Engines.
    1. PosDB is a distributed disk-based column-store built with emphasis on late materialization. The majority of contemporary column-stores support only early materialization, i.e. they operate on individual columns only on the lowest levels of query plan. Usually such systems employ columnar representation only during initial data scans and predicate filtering. Afterwards, they construct tuples and continue to process data similarly to row-stores. Contrary to this, late materialization approach aims to delay tuple reconstruction as much as possible.

      The boom of column-stores featuring late-materialization have left many open questions which we aim to address, namely: late materialization in a distributed environment, aggregation and late materialization, window functions and late materialization, late materialization for subqueries, query optimization and physical design for systems with late materialization.
    2. ToyDBMS -- a toolkit for teaching a DBMS development course. It consists of a set of problems, a reference implementation, and a testing system. The idea is to hand out a skeleton of a query engine which students gradually enhance throughout the semester. This way, at the end of the semester they obtain a system capable of processing moderately interesting queries. Our toolkit provides a performance leaderboard, thus allowing the interested students to compete with each other. During the semester students implement such DBMS components as: query rewriter, a set of relational operators, a statistics module, and a simple rule-based query optimizer. The system is used during the “DBMS design” course in the Higher School of Economics (Saint-Petersburg, Russia), and the ITMO University (Saint-Petersburg, Russia).

      The aim of this project is to design new problems and improve the feature set of the testing system.
  • Discovery of functional (and other) dependencies
    1. Algorithms for functional (and other) dependency discovery. FD discovery addresses the following problem: given a dataset (a table), find all functional dependencies that hold in this dataset. Such regularities in data are of interest to applied researchers since they allow to formulate hypotheses and even draw conclusions regarding the data. Here, the main challenge is that such discovery is a very computationally expensive problem. Even a relatively small dataset may require several days of runtime. In this project we focus on improving such algorithms and their components.
    2. High-performance functional (and other) dependency discovery. Similarly to the previous project, here we aim to improve the performance of FD discovery. However, in this project we focus on the implementation aspects. We design efficient C++ implementations for such algorithms using AVX/AVX2 vectorization, GPU computations and so on.
    3. Query execution and functional dependencies in a DBMS. In this project, we propose to fuse the stored data (tables) and their functional dependencies together inside a DBMS. We aim to make FDs first-class citizens: as in, objects which can be queried and used to query data. Our idea is to allow analysts to explore both data and functional dependencies using the database interface. For example, an analyst may be interested in such tasks as: "find all rows which prevent a given functional dependency from holding", "for a given table, find all functional dependencies that involve a given attribute".

Furthermore, apart from these large projects, there are several individual projects related to a variety of topics.

Contact us if you are interested in any of these projects.