Mining and learning of temporal predictive patterns
The focus of this project is on the developement of data mining and machine learning
methods for subgroup discovery, which is the problem of identifying patterns in data that are most important for predicting and explaining a specific outcome variable. With the emergence of large datasets in all areas of science, technology and everyday life, identification of predictive patterns characterizing different subgroups is extremely promising for: (1) knowledge discovery purposes, especially when new, previously unknown, subgroups (subpopulations of patients) with significantly different outcomes are identified; and (2) feature engineering, when patterns found are used as features one may include when building various classification models to predict the outcome variable.
Our research work covers the following three areas:
- Development of minimal predictive pattern mining framework
- Development of framework for mining temporal clinical data
- Recency heuristic for efficient mining of temporal patterns
Minimal predictive pattern mining framework. Standard predictive pattern mining algorithms work by scanning many different patterns for their ability to predict the outcome variable and assess their quality in terms of some predictive score. The predictive patterns these algorithms return include the patterns that satisfy the pre-specified predictive score threshold. However, many predictive patterns selected this way are redundant in that they do not bring any (or very little) new information when compared to more general patterns that represent larger populations and that were already included in the result. To address the above problem we have developed the minimal predictive pattern mining framework that eliminates pattern redundancies by selecting only those discriminative patterns that are significantly different from more general patterns.
Mining temporal clinical data. Our work focuses primarily on analysis of electronic health records (EHR) and methods for extracting predictive patterns characterizing EHR data and their predictive differences. This is extremely challenging since EHRs consists of complex multivariate time series of observation, tests, and treatments. To construct temporal patterns from complex clinical data we rely on the temporal abstraction approach to obtain a high-level qualitative description of the time series. The temporal abstractions are then combined with temporal logic to form more complex temporal patterns for the abstracted data.
Recent temporal predictive patterns. The space of possible temporal patterns one can define on time series data with the help of
temporal abstractions is enormous. While the minimal predictive pattern mining framework helps us to reduce the number of patterns the algorithm finds,
it may still search and scan through a very large number of patterns. The key challenge is to find ways of reducing the complexity of this space as much as possible, hence improving the efficiency of the mining algorithms. To address this concern we proposed a new approach
that builds predictive patterns for monitoring and event detection problem using the 'recent predictive pattern' heuristic
which captures the intuition that most recent information related to the clinical variable is likely
to be the most important for future prediction.
Funding:
- NIH. 1R01LM010019. Using medical records repositories to improve the alert system design. PI: Hauskrecht, September 2009- September 2013.
- NIH. 1R01GM088224-01 Detecting deviations in clinical care in ICU data stream. PIs: Hauskrecht and Clermont , August 2009-June 2013.
CS Members:
, PhD,
Professor of Computer Science
, former PhD student, currently at Microsoft Inc.
Riccardo Bellazzi, University of Pavia
Gregory F. Cooper, Department of Biomedical Informatics, University of Pittsburgh
D. Fradkin, Siemens Corporation
Fabian Moerchen, Siemens Corporation
Lucia Sacchi, University of Pavia
Publications:
- I. Batal, G. Cooper, D. Fradkin, J. Harrison, F. Moerchen, and M. Hauskrecht.
An Efficient Pattern Mining Approach for Event
Detection in Multivariate Temporal Data
Knowledge and Information Science, 2014
- I. Batal, H. Valizadegan, G. Cooper and M. Hauskrecht.
A Temporal Pattern Mining Approach for Classifying Electronic
Health Record Data.
Transactions on Intelligent Systems and Technology, Special Issue on Health Informatics, 4: 4, 2013.
- I. Batal, G. Cooper, and M. Hauskrecht.
A Bayesian Scoring Technique for Mining Predictive and Non-Spurious Rules.
The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases , Bristol, UK, September 2012.
- I. Batal, D. Fradkin, J. Harrison, F. Moerchen, and M. Hauskrecht.
Mining Recent Temporal Patterns for Event
Detection in Multivariate Time Series Data.
The 18th ACMSIGKDD Conference on Knowledge Discovery and Data Mining (KDD), Beijing, China, August 2012.
- I. Batal, H. Valizadegan, GF. Cooper, and M. Hauskrecht.
A Pattern Mining Approach for Classifying Multivariate Temporal Data,
IEEE International Conference on Bioinformatics and Biomedicine , Atlanta, Georgia, November 2011.
- I. Batal, and M. Hauskrecht.
Mining Clinical Data using Minimal Predictive Rules,
Annual American Medical Informatics Association (AMIA) Symposium , November 2010.
- I. Batal, M. Hauskrecht.
Constructing Classification Features using Minimal Predictive Patterns.
Proceedings of the 19th ACM international conference on Information and knowledge management , November 2010, pp. 869-878.
- I. Batal, M. Hauskrecht.
A Concise Representation of Association
Rules using Minimal Predictive Rules,
Europian Conference on Machine Learning and Knowledge Discovery in Databases, September 2010.
- I. Batal, L. Sacchi, R. Bellazzi, and M. Hauskrecht.
A Temporal Abstraction Framework for Classifying Clinical Temporal Data
Annual American Medical Informatics Association (AMIA) Symposium , 2009.
- I. Batal, L. Sacchi, R. Bellazzi, and M. Hauskrecht.
Multivariate Time Series Classification with Temporal Abstractions.
In Proceedings of the
Twenty-Second International Florida AI Research Society Conference (FLAIRS 2009), May 2009.
This web page is updated by milos