首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Comparison of Supervised Clustering Methods for the Analysis of DNA Microarray Expression Data
Authors:XIAO Jing  WANG Xue-feng  YANG Ze-feng  XU Chen-wu
Institution:1. Jiangsu Provincial Key Laboratory of Crop Genetics and Physiology,Yangzhou University,Yangzhou 225009,P.R.China;The School of Public Health,Nantong University,Nantong 226001,P.R.China
2. Jiangsu Provincial Key Laboratory of Crop Genetics and Physiology,Yangzhou University,Yangzhou 225009,P.R.China
Abstract:Several typical supervised clustering methods such as Gaussian mixture model-based supervised clustering(GMM),knearest-neighbor(KNN),binary support vector machines(SVMs)and multiclass support vector machines(MC-SVMs)were employed to classify the computer simulation data and two real microarray expression datasets.False positive,false negative,true positive,true negative,clustering accuracy and Matthews' correlation coefficient (MCC) were compared among these methods.The results are as follows:(1)In classifying thousands of gene expression data,the performances of two GMM methods have the maximal clustering accuracy and the least overall FP FN error numbers on the basis of the assumption that the whole set of microarray data are a finite mixture of multivariate Gaussian distributions.Furthermore,when the number of training sample is very small,the clustering accuracy of GMM-II method has superiority over GMMI method.(2)In general,the superior classification performance of the MC-SVMs are more robust and more practical,which are less sensitive to the curse of dimensionality,and not only next to GMM method in clustering accuracy to thousands of gene expression data,but also more robust to a small number of high-dimensional gene expression samples than other techniques.(3)Of the MC-SVMs,OVO and DAGSVM perform better on the large sample sizes,whereas five MC-SVMs methods have very similar performance on moderate sample sizes.In other cases,OVR,WW and CS yield better results when sample sizes are small.So,it is recommended that at least two candidate methods,choosing on the basis of the real data features and experimental conditions,should be performed and compared to obtain better clustering result.
Keywords:microarray  supervised clustering  k-nearest-neighbor(KNN)  support vector machines (SVMs)
本文献已被 维普 万方数据 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号