<-- home

After a (very) long time some updates

Despite me saying that I will try to keep updating my personal blog, I didn’t manage to stick to my promise. Therefore, I would like to motivate myself in doing so by providing a quick update of my recent activities.

VLDB 2017 Attendance

The ADMT Lab attended VLDB 2017 Conference held in Munich, Germany. Our list of papers is the following:

The first paper presents work done for fast detection of correlated data streams in an online fashion. Rakan and Daniel has had a series of two papers of time-series online processing for finding correlated series in a real-time environment.

The second paper is part of my thesis, which focuses on adaptable Stream Processing Engines. In this paper, I show that incorporating aggregation costs of stateful operations during the partitioning process of a stream can lead to better decions. To this end, I propose a new model for stream partitioning and a series of heuristic algorithms based on it. In addition, I found out that better decisions can lead to higher memory cost. Hence, I came up with an optimistic Hyperloglog operation, which can significantly limit the memory requirements. Even though this is a simple technique, I haven’t encountered it in other DB Papers.

The last paper, is joint work with the Database Group at Microsoft Research, Redmond WA. I spent the summer of 2016 as an intern and worked on Microsoft’s Trill. Part of the project was Just-In-Time code generation for faster processing of JSON data in a streaming fashion. However, during this internship, Yinan Li and me took a step back and explored ways for improving the parsing process itself. Our efforts lead us to Mison, a novel parsing algorithm for JSON data, which achieves 10x better throughput over state-of-the-art parsing tools.

Current work

After working in Stream Processing Engines’ elasticity and adaptable stream partitioning, I decided to take a step back and look into load shedding. In my experiments, I have encountered situations in which neither stream partitioning, nor elasticity are able to diminish performance degradation. Unfortunately, previously proposed load shedding techniques are either not applicable to current use-cases or present best-effort solutions, which can lead to increased error in the output. Therefore, I am exploring new techniques for informed load shedding. I am excited to share with you some of the results of my techniques!