Data guarantees in distributed stream processing
Guarantees on data provided by state-of-the-art stream processing systems (Storm, Flink, Spark Streaming) are defined in terms of so-called delivery guarantees. Exactly once is the strongest one and the most desirable for end-user. However, there are several issues regarding this concept. Mechanisms that are commonly used to enforce exactly once produce significant performance overhead.
Besides, the notion of exactly once is informal and does not capture all properties that provide stream processing systems supporting this guarantee. In this talk, we introduce a formal framework that allows us to define streaming guarantees in a more regular way.
We demonstrate that the properties of delivery, consistency, and determinism are tightly connected within distributed stream processing. We also show that having lightweight determinism it is possible to provide exactly once with almost no performance overhead.
Speaker: Artem Trofimov.
Presentation language: Russian.
Date and time: April 4th, 20:00-21:30.
Location: Times, room 405.
Videos from seminars will be available at http://bit.ly/MLJBSeminars
- About seminars
23 May 2019Aggregation of pairwise comparisons with reduction of biases
16 May 2019Recommendation system for writing texts in a specific domain area
25 April 2019Topic modeling of a news stream
18 April 2019Search engine based on word embeddings
11 April 2019Building execution graphs in distributed systems