Исследовательская группа

Лаборатория языковых инструментов

Context-Free Path Querying: Algorithms and Applications

Руководитель проекта: Семён Григорьев
Статус: Активный

This project is a space for context-free path querying (CFPQ) algorithms development, evaluation and comparison.

Main parts:

  • Data set for CFPQ evaluation.
  • Meerkat is a parser combinator library for CFPQ.
  • CoFRA is a CFL reachability based framework for static analysis tools development. Contains ReSharper and Rider plugins as a demo.
  • YaccConstructor is a sandbox for CFPQ algorithms development.

Участники

Публикации

  • Sergey Bozhko, Leyla Khatbullina, Semyon Grigorev

    The Bar-Hillel theorem states that context-free languages are closed under intersection with a regular set. This theorem has a constructive proof and thus provides a formal justification of correctness of the algorithms for applications mentioned above. Mechanization of the Bar-Hillel theorem, therefore, is both a fundamental result of formal language theory and a basis for the certified implementation of the algorithms for applications. In this work, we present the mechanized proof of the Bar-Hillel theorem in Coq.

    Logic, Language, Information, and Computation,
  • Nikita Mishin, Iaroslav Sokolov, Egor Spirin, Vladimir Kutuev, Egor Nemchinov, Sergey Gorbatyuk, and Semyon Grigorev

    Recently proposed matrix multiplication based algorithm for context-free path querying (CFPQ) offloads the most performance-critical parts onto boolean matrices multiplication. Thus, it is possible to achieve high performance of CFPQ by means of modern parallel hardware and software. In this paper, we provide results of empirical performance comparison of different implementations of this algorithm on both real-world data and synthetic data for the worst cases.

    Proceedings of the 2nd Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA),
  • Ekaterina Verbitskaia, Ilya Kirillov, Ilya Nozkin, Semyon Grigorev

    Transparent integration of a domain-specific language for specification of context-free path queries (CFPQs) into a general-purpose programming language as well as static checking of errors in queries may greatly simplify the development of applications using CFPQs. LINQ and ORM can be used for the integration, but they have issues with flexibility: query decomposition and reusing of subqueries are a challenge. Adaptation of parser combinators technique for paths querying may solve these problems. Conventional parser combinators process linear input, and only the Trails library is known to apply this technique for path querying. We demonstrate that it is possible to create general parser combinators for CFPQ which support arbitrary context-free grammars and arbitrary input graphs. We implement a library of such parser combinators and show that it is applicable for realistic tasks.

    Proceedings of the 9th ACM SIGPLAN International Symposium on Scala,
  • Rustam Azimov, Semyon Grigorev
    GRADES-NDA '18 Proceedings of the 1st ACM SIGMOD Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA),
  • Semyon Grigorev, Anastasiya Ragozina

    There are several solutions for CFPQ, but how to provide structural representation of query result which is practical for answer processing and debugging is still an open problem. In this paper we propose a graph parsing technique which allows one to build such representation with respect to given grammar in polynomial time and space for arbitrary context-free grammar and graph. Proposed algorithm is based on generalized LL parsing algorithm, while previous solutions are based mostly on CYK or Earley algorithms, which reduces time complexity in some cases.

    Proceedings of the 13th Central & Eastern European Software Engineering Conference in Russia (CEE-SECR '17),
  • Ekaterina Verbitskaia , Semyon Grigorev, Dmitry Avdyukhin

    We present a technique for syntax analysis of a regular set of input strings. This problem is relevant for the analysis of string-embedded languages when a host program generates clauses of embedded language at run time. Our technique is based on a generalization of RNGLR algorithm, which, inherently, allows us to construct a finite representation of parse forest for regularly approximated set of input strings. This representation can be further utilized for semantic analysis and transformations in the context of reengineering, code maintenance, program understanding etc. The approach in question implements relaxed parsing: non-recognized strings in approximation set are ignored with no error detection.

    Perspectives of System Informatics,

Дополнительно

  • Sources and data set on GitHub

    Sources of matrix-based algorithm implementations, sources of testing system, collected data set for CFPQ algorithms evaluation (graphs and queries).