Gene expression data clustering and it’s application in differential analysis of leukemia

authors:

avatar Hamid Alavi Majd ORCID , * , avatar Hamid Alavi Majd , avatar Yalda Mehrabi , avatar Bahar Taghavi


how to cite: Alavi Majd H, Alavi Majd H, Mehrabi Y, Taghavi B. Gene expression data clustering and it’s application in differential analysis of leukemia. koomesh. 2008;9(2):e152187. 

Abstract

Introduction: DNA microarray technique is one of the most important categories in bioinformatics, which allows the possibility of monitoring thousands of expressed genes has been resulted in creating giant data bases of gene expression data, recently. Statistical analysis of such databases included normalization, clustering, classification and etc. Materials and Methods: Golub et al (1999) collected data bases of leukemia based on the method of oligonucleotide. The data is on the internet. In this paper, we analyzed gene expression data. It was clustered by several methods including multi-dimensional scaling, hierarchical and non-hierarchical clustering. Data set included 20 Acute Lymphoblastic Leukemia (ALL) patients and 14 Acute Myeloid Leukemia (AML) patients. The results of tow methods of clustering were compared with regard to real grouping (ALL & AML). R software was used for data analysis. Results: Specificity and sensitivity of divisive hierarchical clustering in diagnosing of ALL patients were 75% and 92%, respectively. Specificity and sensitivity of partitioning around medoids in diagnosing of ALL patients were 90% and 93%, respectively. These results showed a well accomplishment of both methods of clustering. It is considerable that, due to clustering methods results, one of the samples was placed in ALL groups, which was in AML group in clinical test. Conclusion: With regard to concordance of the results with real grouping of data, therefore we can use these methods in the cases where we don't have accurate information of real grouping of data. Moreover, Results of clustering might distinct subgroups of data in such a way that would be necessary for concordance with clinical outcomes, laboratory results and so on.