After a (very) long time some updates

October 14, 2017

Despite me saying that I will try to keep updating my personal blog, I didn’t manage to stick to my promise. Therefore, I would like to motivate myself in doing so by providing a quick update of my recent activities.

VLDB 2017 Attendance

The ADMT Lab attended VLDB 2017 Conference held in Munich, Germany. Our list of papers is the following:

Detection of Highly Correlated Live Data Streams, by Rakan Alseghayer, Daniel Petrov, Panos K. Chrysanthis, Mohamed A. Sharaf, and Alexandros Labrinidis.
A holistic view of stream partitioning costs, by Nikos R. Katsipoulakis, Alexandros Labrinidis, Panos K. Chrysanthis.
Mison: A Fast JSON Parser for Data Analytics, by Yinan Li, Nikos R. Katsipoulakis, Badrish Chandramouli, Jonathan Goldstein, Donald Kossmann.

The first paper presents work done for fast detection of correlated data streams in an online fashion. Rakan and Daniel has had a series of two papers of time-series online processing for finding correlated series in a real-time environment.

The second paper is part of my thesis, which focuses on adaptable Stream Processing Engines. In this paper, I show that incorporating aggregation costs of stateful operations during the partitioning process of a stream can lead to better decions. To this end, I propose a new model for stream partitioning and a series of heuristic algorithms based on it. In addition, I found out that better decisions can lead to higher memory cost. Hence, I came up with an optimistic Hyperloglog operation, which can significantly limit the memory requirements. Even though this is a simple technique, I haven’t encountered it in other DB Papers.

The last paper, is joint work with the Database Group at Microsoft Research, Redmond WA. I spent the summer of 2016 as an intern and worked on Microsoft’s Trill. Part of the project was Just-In-Time code generation for faster processing of JSON data in a streaming fashion. However, during this internship, Yinan Li and me took a step back and explored ways for improving the parsing process itself. Our efforts lead us to Mison, a novel parsing algorithm for JSON data, which achieves 10x better throughput over state-of-the-art parsing tools.

Current work

After working in Stream Processing Engines’ elasticity and adaptable stream partitioning, I decided to take a step back and look into load shedding. In my experiments, I have encountered situations in which neither stream partitioning, nor elasticity are able to diminish performance degradation. Unfortunately, previously proposed load shedding techniques are either not applicable to current use-cases or present best-effort solutions, which can lead to increased error in the output. Therefore, I am exploring new techniques for informed load shedding. I am excited to share with you some of the results of my techniques!