Mining and learning of temporal predictive patterns

The focus of this project is on the developement of data mining and machine learning methods for subgroup discovery, which is the problem of identifying patterns in data that are most important for predicting and explaining a specific outcome variable. With the emergence of large datasets in all areas of science, technology and everyday life, identification of predictive patterns characterizing different subgroups is extremely promising for: (1) knowledge discovery purposes, especially when new, previously unknown, subgroups (subpopulations of patients) with significantly different outcomes are identified; and (2) feature engineering, when patterns found are used as features one may include when building various classification models to predict the outcome variable.

Our research work covers the following three areas:

Minimal predictive pattern mining framework. Standard predictive pattern mining algorithms work by scanning many different patterns for their ability to predict the outcome variable and assess their quality in terms of some predictive score. The predictive patterns these algorithms return include the patterns that satisfy the pre-specified predictive score threshold. However, many predictive patterns selected this way are redundant in that they do not bring any (or very little) new information when compared to more general patterns that represent larger populations and that were already included in the result. To address the above problem we have developed the minimal predictive pattern mining framework that eliminates pattern redundancies by selecting only those discriminative patterns that are significantly different from more general patterns.

Mining temporal clinical data. Our work focuses primarily on analysis of electronic health records (EHR) and methods for extracting predictive patterns characterizing EHR data and their predictive differences. This is extremely challenging since EHRs consists of complex multivariate time series of observation, tests, and treatments. To construct temporal patterns from complex clinical data we rely on the temporal abstraction approach to obtain a high-level qualitative description of the time series. The temporal abstractions are then combined with temporal logic to form more complex temporal patterns for the abstracted data.

Recent temporal predictive patterns. The space of possible temporal patterns one can define on time series data with the help of temporal abstractions is enormous. While the minimal predictive pattern mining framework helps us to reduce the number of patterns the algorithm finds, it may still search and scan through a very large number of patterns. The key challenge is to find ways of reducing the complexity of this space as much as possible, hence improving the efficiency of the mining algorithms. To address this concern we proposed a new approach that builds predictive patterns for monitoring and event detection problem using the 'recent predictive pattern' heuristic which captures the intuition that most recent information related to the clinical variable is likely to be the most important for future prediction.


CS Members:


This web page is updated by milos