Semyon Grigorev
Biography
 2006 – 2012 Saint Petersburg State University, master's degree in Information Technology, thesis: Automated transformation of dynamic SQL queries in information system reengineering.
 2012 – 2016 Ph.D. student at Saint Petersburg State University. PhD thesis: Parsing of dynamically generated code.
 2013 – 2017 Senior Lecturer at the Saint Petersburg State University.
 2012 – to date researcher at JetBrains.
 2017 – to date Associate Professor at the Saint Petersburg State University.
Professional Activity
Research interests
Formal language theory and application for biology, graph databases, static code analysis, formal grammars and other languages specification formalisms, parallel and asynchronous computations.
Teaching
 Formal Language Theory
 Graph Theory
 Algorithms and Data Structures
 Practice of Programming
Grants
 Russian Foundation for Basic Research grant 193790101 (2019today)
 Russian Science Foundation grant 181100100 (2018today)
 Russian Foundation for Basic Research grant 180100380 А (2018today)
 Russian Foundation for Basic Research grant 150105431 А (20152017)
Conferences
 SEIM2017 2018 2019: PC member
 CIBB2019 (Algebraic and Computational Methods for the Study of RNA Behaviour): PC member
Other
 Invited research speaker at InriaLINKS
 The Best Research paper in the field of software engineering, and the winner of the Bertrand Meyer’s Award (SECR2014)
More information
Academic Advising
Projects

.NET and GPGPU integration based on F# quotations to OpenCL translator.Project supervisor: Semyon Grigorev

Modular tool for parser construction and grammars processingProject supervisor: Semyon Grigorev

The composition of formal grammars and artificial neural networks for secondary structure analysis.Project supervisor: Semyon Grigorev

Space for contextfree path querying algorithms research and development.Project supervisor: Semyon Grigorev

Is it possible to use partial evaluation for GPGPU programs optimization?Project supervisor: Daniil Berezun
Publications

Proceedings of the Institute for System Programming, June 2020
This paper aims to present Valiant’s algorithm modification, which main advantage is the possibility to divide the parsing table into successively computed layers of disjoint submatrices where each submatrix of the layer can be processed independently. Moreover, our approach is easily adapted for the stringmatching problem.

GRADESNDA'20: Proceedings of the 3rd Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA), June 2020
A recent study showed that the applicability of contextfree path querying (CFPQ) algorithms with relational query semantics integrated with graph databases is limited because of low performance and high memory consumption of existing solutions. In this work, we implement a matrixbased CFPQ algorithm by using appropriate highperformance libraries for linear algebra and integrate it with RedisGraph graph database. Also, we introduce a new CFPQ algorithm with singlepath query semantics that allows us to extract one found path for each pair of nodes. Finally, we provide the evaluation of our algorithms for both semantics which shows that matrixbased CFPQ implementation for RedisGraph database is performant enough for realworld data analysis.

PPoPP '20: Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, February 2020
While GPU utilization allows one to speed up computations to the orders of magnitude, memory management remains the bottleneck making it often a challenge to achieve the desired performance. Hence, different memory optimizations are leveraged to make memory being used more effectively. We propose an approach automating memory management utilizing partial evaluation, a program transformation technique that enables data accesses to be precomputed, optimized, and embedded into the code, saving memory transactions. An empirical evaluation of our approach shows that the transformed program could be up to 8 times as efficient as the original one in the case of CUDA C naïve string pattern matching algorithm implementation.

Programming and Computer Software, December 2019
Path querying with conjunctive grammars is known to be undecidable. There is an algorithm for path querying with linear conjunctive grammars which provides an overapproximation of the result, but there is no algorithm for arbitrary conjunctive grammars. We propose the first algorithm for path querying with arbitrary conjunctive grammars. The proposed algorithm is matrixbased and allows us to efficiently apply GPGPU computing techniques and other optimizations for matrix operations.
 BMC Bioinformatics, November 2019

Recently proposed matrix multiplication based algorithm for contextfree path querying (CFPQ) offloads the most performancecritical parts onto boolean matrices multiplication. Thus, it is possible to achieve high performance of CFPQ by means of modern parallel hardware and software. In this paper, we provide results of empirical performance comparison of different implementations of this algorithm on both realworld data and synthetic data for the worst cases.Proceedings of the 2nd Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA), June 2019

Logic, Language, Information, and Computation, June 2019
The BarHillel theorem states that contextfree languages are closed under intersection with a regular set. This theorem has a constructive proof and thus provides a formal justification of correctness of the algorithms for applications mentioned above. Mechanization of the BarHillel theorem, therefore, is both a fundamental result of formal language theory and a basis for the certified implementation of the algorithms for applications. In this work, we present the mechanized proof of the BarHillel theorem in Coq.

Proceedings of the 12th International Joint Conference on Biomedical Engineering Systems and Technologies  BIOINFORMATICS, March 2019
We propose a way to combine formal grammars and artificial neural networks for biological sequences processing. Formal grammars encode the secondary structure of the sequence and neural networks deal with mutations and noise. In contrast to the classical way, when probabilistic grammars are used for secondary structure modeling, we propose to use arbitrary (not probabilistic) grammars which simplifies grammar creation. Instead of modeling the structure of the whole sequence, we create a grammar which only describes features of the secondary structure. Then we use matrixbased parsing to extract features: the fact that some substring can be derived from some nonterminal is a feature. After that, we use a dense neural network to process features.

Proceedings of the Institute for System Programming, January 2019
One of the problems in graph data analysis is querying for specific paths. Such queries are usually performed by means of a formal grammar that describes the allowed edgelabeling of the paths. Path query is said to be calculated using relational query semantics if it is evaluated to triple ((A,v1,v2), such that there is a path from v1 to v2 such that the labels on the edges of this path form a string derivable from the nonterminal A. We focus on the Boolean languages that use Boolean grammars to describe the labeling of paths. Although path querying using relational query semantics and Boolean grammars is known to be undecidable, in this work we propose a path querying algorithm on acyclic graphs which uses relational query semantics and Boolean grammars and approximates the exact solution. To achieve better performance in compare with the naive algorithm, considered classes of graphs were limited to acyclic graphs.

Proceedings of the 9th ACM SIGPLAN International Symposium on Scala, September 2018
Transparent integration of a domainspecific language for specification of contextfree path queries (CFPQs) into a generalpurpose programming language as well as static checking of errors in queries may greatly simplify the development of applications using CFPQs. LINQ and ORM can be used for the integration, but they have issues with flexibility: query decomposition and reusing of subqueries are a challenge. Adaptation of parser combinators technique for paths querying may solve these problems. Conventional parser combinators process linear input, and only the Trails library is known to apply this technique for path querying. We demonstrate that it is possible to create general parser combinators for CFPQ which support arbitrary contextfree grammars and arbitrary input graphs. We implement a library of such parser combinators and show that it is applicable for realistic tasks.

September 2018
Extended abstract at TyDe 2018 (at ICFP).
 GRADESNDA '18 Proceedings of the 1st ACM SIGMOD Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA), June 2018

Proceedings of the 13th Central & Eastern European Software Engineering Conference in Russia (CEESECR '17), December 2017
There are several solutions for CFPQ, but how to provide structural representation of query result which is practical for answer processing and debugging is still an open problem. In this paper we propose a graph parsing technique which allows one to build such representation with respect to given grammar in polynomial time and space for arbitrary contextfree grammar and graph. Proposed algorithm is based on generalized LL parsing algorithm, while previous solutions are based mostly on CYK or Earley algorithms, which reduces time complexity in some cases.
 arXiv, July 2017
 Proceedings of the Institute for System Programming, August 2016

Perspectives of System Informatics, June 2016
We present a technique for syntax analysis of a regular set of input strings. This problem is relevant for the analysis of stringembedded languages when a host program generates clauses of embedded language at run time. Our technique is based on a generalization of RNGLR algorithm, which, inherently, allows us to construct a finite representation of parse forest for regularly approximated set of input strings. This representation can be further utilized for semantic analysis and transformations in the context of reengineering, code maintenance, program understanding etc. The approach in question implements relaxed parsing: nonrecognized strings in approximation set are ignored with no error detection.
 Systems and Means of Informatics, 2016
 Proceedings of the 11th Central & Eastern European Software Engineering Conference in Russia, 2015
 Systems and Means of Informatics, 2015
 Proceedings of 10th International Andrei Ershov Memorial Conference on Perspectives of System Informatics, 2015
 Proceedings of the 10th Central and Eastern European Software Engineering Conference in Russia 2014, 2014
 Proceedings of the 9th Central & Eastern European Software Engineering Conference in Russia, 2013

Programming Languages and Tools Lab Head of group, researcher
 Formal Grammars and Languages
 Syntax Analysis
 Parsing Algorithms
 Bioinformatics