# Лаборатория языковых инструментов

## Запрос Path без контекста: алгоритмы и приложения

Этот проект представляет собой пространство для разработки алгоритмов контекстно-свободного запроса путей (CFPQ), их оценки и сравнения.

Основные составляющие:

- Датасет для оценки CFPQ.
- Коллекция алгоритмов CFPQ, реализованных поверх GraphBLAS API.
- Наш форк RedisGraph, где мы работаем над расширением CFPQ для RedisGraph.
- Meerkat — библиотека комбинаторов парсеров для CFPQ.
- CoFRA — основанный на достижимости CFL фреймворк для разработки инструментов статического анализа. В качестве демонстрации содержит плагины ReSharper и Rider.
- YaccConstructor — песочница для разработки алгоритмов CFPQ.

## Участники

## Материалы

## Публикации

### Recursive Expressions for SPARQL Property Paths

August 2020

Ciro Medeiros, Umberto Costa, Semyon Grigorev, Martin A. Musicante

### Context-Free Path Querying by Kronecker Product

August 2020

Egor Orachev, Ilya Epelbaum, Rustam Azimov, Semyon Grigorev

Context-free path queries (CFPQ) extend the regular path queries (RPQ) by allowing context-free grammars to be used as constraints for paths. Algorithms for CFPQ are actively developed, but J. Kuijpers et al. have recently concluded, that existing algorithms are not performant enough to be used in real-world applications. Thus the development of new algorithms for CFPQ is justified. In this paper, we provide a new CFPQ algorithm which is based on such linear algebra operations as Kronecker product and transitive closure and handles grammars presented as recursive state machines. Thus, the proposed algorithm can be implemented by using high-performance libraries and modern parallel hardware. Moreover, it avoids grammar growth which provides the possibility for queries optimization.

### Context-Free Path Querying with Single-Path Semantics by Matrix Multiplication

June 2020

Arseniy Terekhov, Artyom Khoroshev, Rustam Azimov, Semyon Grigorev

A recent study showed that the applicability of context-free path querying (CFPQ) algorithms with relational query semantics integrated with graph databases is limited because of low performance and high memory consumption of existing solutions. In this work, we implement a matrix-based CFPQ algorithm by using appropriate high-performance libraries for linear algebra and integrate it with RedisGraph graph database. Also, we introduce a new CFPQ algorithm with single-path query semantics that allows us to extract one found path for each pair of nodes. Finally, we provide the evaluation of our algorithms for both semantics which shows that matrix-based CFPQ implementation for Redis-Graph database is performant enough for real-world data analysis.

### Context-Free Path Querying via Matrix Equations

June 2020

Yuliya Susanina

### Path Querying with Conjunctive Grammars by Matrix Multiplication

December 2019

R. Azimov and S. Grigorev

Path querying with conjunctive grammars is known to be undecidable. There is an algorithm for path querying with linear conjunctive grammars which provides an over-approximation of the result, but there is no algorithm for arbitrary conjunctive grammars. We propose the first algorithm for path querying with arbitrary conjunctive grammars. The proposed algorithm is matrix-based and allows us to efficiently apply GPGPU computing techniques and other optimizations for matrix operations.

### Bar-Hillel Theorem Mechanization in Coq

June 2019

Sergey Bozhko, Leyla Khatbullina, Semyon Grigorev

The Bar-Hillel theorem states that context-free languages are closed under intersection with a regular set. This theorem has a constructive proof and thus provides a formal justification of correctness of the algorithms for applications mentioned above. Mechanization of the Bar-Hillel theorem, therefore, is both a fundamental result of formal language theory and a basis for the certified implementation of the algorithms for applications. In this work, we present the mechanized proof of the Bar-Hillel theorem in Coq.

### Evaluation of the Context-Free Path Querying Algorithm Based on Matrix Multiplication

June 2019

Nikita Mishin, Iaroslav Sokolov, Egor Spirin, Vladimir Kutuev, Egor Nemchinov, Sergey Gorbatyuk, and Semyon Grigorev

### Path querying on acyclic graphs using Boolean grammars

January 2019

Shemetova E.N., Grigorev S.V.

One of the problems in graph data analysis is querying for specific paths. Such queries are usually performed by means of a formal grammar that describes the allowed edge-labeling of the paths. Path query is said to be calculated using relational query semantics if it is evaluated to triple ((*A*,*v*1,*v*2), such that there is a path from *v*1 to *v*2 such that the labels on the edges of this path form a string derivable from the nonterminal A. We focus on the Boolean languages that use Boolean grammars to describe the labeling of paths. Although path querying using relational query semantics and Boolean grammars is known to be undecidable, in this work we propose a path querying algorithm on acyclic graphs which uses relational query semantics and Boolean grammars and approximates the exact solution. To achieve better performance in compare with the naive algorithm, considered classes of graphs were limited to acyclic graphs.

### Parser combinators for context-free path querying

September 2018

Ekaterina Verbitskaia, Ilya Kirillov, Ilya Nozkin, Semyon Grigorev

Transparent integration of a domain-specific language for specification of context-free path queries (CFPQs) into a general-purpose programming language as well as static checking of errors in queries may greatly simplify the development of applications using CFPQs. LINQ and ORM can be used for the integration, but they have issues with flexibility: query decomposition and reusing of subqueries are a challenge. Adaptation of parser combinators technique for paths querying may solve these problems. Conventional parser combinators process linear input, and only the Trails library is known to apply this technique for path querying. We demonstrate that it is possible to create general parser combinators for CFPQ which support arbitrary context-free grammars and arbitrary input graphs. We implement a library of such parser combinators and show that it is applicable for realistic tasks.

### Context-Free Path Querying with Structural Representation of Result

December 2017

Semyon Grigorev, Anastasiya Ragozina

There are several solutions for CFPQ, but how to provide structural representation of query result which is practical for answer processing and debugging is still an open problem. In this paper we propose a graph parsing technique which allows one to build such representation with respect to given grammar in polynomial time and space for arbitrary context-free grammar and graph. Proposed algorithm is based on generalized LL parsing algorithm, while previous solutions are based mostly on CYK or Earley algorithms, which reduces time complexity in some cases.

### Relaxed Parsing of Regular Approximations of String-Embedded Languages

June 2016

Ekaterina Verbitskaia , Semyon Grigorev, Dmitry Avdyukhin

We present a technique for syntax analysis of a regular set of input strings. This problem is relevant for the analysis of string-embedded languages when a host program generates clauses of embedded language at run time. Our technique is based on a generalization of RNGLR algorithm, which, inherently, allows us to construct a finite representation of parse forest for regularly approximated set of input strings. This representation can be further utilized for semantic analysis and transformations in the context of reengineering, code maintenance, program understanding etc. The approach in question implements *relaxed parsing*: non-recognized strings in approximation set are ignored with no error detection.