Name:CLUSTER
Description:Chaotic map clustering
Abstract:CMC is a tool to find cluster on a large set of microarray data sets. Microarray data are a rich source of information containing the expression values of thousands of genes for a well defined state of a cell or a tissue. To extensively study correlations between genes on the level of gene expression a huge amount (thousands) of array data are publicly available and ready for analysis. Having such a large variety of experiments available it is possible to start with a large set of data to extract relevant information for further analysis. Clustering is a good and challenging analysis method for data sets of such a complexity and size. We choose an unsupervised hierachical algorithm, the Chaotic Map Clustering, in a coupled two-way approach to analyse the data. However, the clustering approach is intrinsically difficult due to the unknown structure of the data and to the interpretation of the clustering results. It is therefore fundamental to evaluate the quality of an unsupervised procedure of such a mixed set of data and to validate the clustering results, separating those clusters due to noise or statistical fluctuations. The CMC tool uses a resampling method to perform this validation. The resampling procedure applies the clustering algorithm to a large number of random sub-samples of the original data matrix and therefore, the whole process becomes calculation intensive and time consuming. Using the Grid technology it is possible to drastically speed up this process by distributing the clustering of each matrix on a separate worker node and retrieve resampling results within few hours instead of several days.

Created:2010-05-01
Last updated:2013-09-04