<-- home

New Paper presented in IEEE BigData 2018

I am happy to announce that my work on load-shedding for current Stream Processing Engines has been accepted for publication in the proceedings of the IEEE International Conference on Big Data 2018. The conference will be held in Seattle (WA) from 12/10/2018 to 12/13/2018. The paper is titled “Concept-Driven Load Shedding: Reducing Size and Error of Voluminous and Variable Data Streams”. and is the result of my collaboration with Alexandros Labrinidis and Panos K. Chrysanthis. This work is partially funded by the Pitt Smart Living Project.

Load shedding is a technique that aims to ameliorate the consequences of the Velocity and the Volume of Big Data stream processing. When temporal input spikes appear, tuples are shed until a Stream Processing Engine’s (SPE) processing capacity is not overwhelmed and results are produced in a timely fashion. Existing load shedding techniques have become obsolete and are not applicable to modern use-cases which require the extraction of patterns from continuously evolving (i.e., Variable) voluminous streams.

In this work, we identify the shortcomings of existing load shedding techniques when applied to streams with concept drift. We propose Concept-Driven load shedding (CoD), which aims at limiting the data volume imposed on the SPE while producing high accuracy results. On top of that, we designed CoD for modern SPEs and made its overhead negligible. Our experiments indicate that CoD can deliver more than 10x more accurate results compared to the state of the art in load shedding. Also, CoD can offer up to 2.25x better performance compared to normal processing and reduce the processed data volume significantly.