Machine learning and data mining for bioinformatics applications

High-throughput genomic and proteomic profiling harbor great expectation for improving early detection and diagnosis of many diseases or aid in optimizing therapeutical options for patients suffering from various maladies. The high-throughput nature of genomic and proteomic data, however, presents a number of computational difficulties for the most avid explorer. The number of potential biomarkers (genes, MS profile peaks) discriminating in between disease and control samples can be large but only few of these may carry a biologically significant signal. Identification of features (signals) that are likely to provide a useful information related to disease, as well as, relations among these features, remain important open research questions. Our aim is to advance and develop computational machine learning solutions that scale-up well to high-dimensional data characteristic of bioinformatics data sources. Specific problems we are interested in include discovery and validation of potential disease biomarkers, component (latent variable model) analysis of data, and construction of multivariate classification models for early-detection or diagnoses of diseases based on such data.


CS people who worked on the project:

Project funding:

Project collaborators:


Posters with our collaborators:

The web page is updated by milos.