Application of fuzzy clustering in analysis of included proteins in esophagus, stomach and colon cancers based on similarity of Gene Ontology annotation

authors:

avatar zarnegarnia zarnegarnia , * , avatar Hamid Alavi Majd , avatar Mostafa Rezaie Tavirani , avatar Nasibe Kheyr , avatar Ali akbar KhademMabodi


how to cite: zarnegarnia Z, Alavi Majd H, Rezaie Tavirani M, Kheyr N, KhademMabodi A A. Application of fuzzy clustering in analysis of included proteins in esophagus, stomach and colon cancers based on similarity of Gene Ontology annotation. koomesh. 2010;12(1):e152430. 

Abstract

  Introduction: Because of producing large amount of proteomics data and requiring new procedures for analyzing them, collective analysis of proteins can help us in identifying new annotation patterns in dataset. Furthermore, this type of analysis is a time- consuming process too. Cluster analysis, as a suitable statistic procedure, can be used for analyzing these datasets. This paper's objective was evaluating the efficiency of fuzzy clustering method in recognizing new patterns within proteins which are related to gastric cancers.   Materials and Methods: Fuzzy clustering procedure has been used to analyze the identified included proteins in esophagus, stomach and colon cancers. Proteins were clustered based on three aspects of Gene Ontology (GO) and results were compared.   Results: Fuzzy clustering was implemented and non-fuzziness indexes based on biological process, cellular component and molecular function were obtained equal to 0.41, 0.55 and 0.35, respectively. Obtained index based on molecular function showed the efficiency of fuzzy clustering method. Despite of non-substantial silhouette widths for the entire dataset, most of the proteins in each cluster had remarkable biological comm:::union:::s. Using Term Enrichment software to determine statistically enriched GO terms in the entire dataset and clusters, it was cleared that the fuzzy clustering has revealed novel annotation patterns within dataset that would not have been identified otherwise.   Conclusion: Considering fuzzy clustering outputs, the efficiency of this method for better and flexible proteins analysis was cleared. As fuzzy clustering method has placed proteins, that have more similarities, with high probabilities together. Therefore, it can be used for the situations that some of proteins have unknown characteristics. Furthermore it seems that the proteins clustered via their cellular component similarities, have also biological and functional similarities which this requires more investigations.