Comparison of Supervised Clustering Methods for the Analysis of DNA Microarray Expression Data期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

Comparison of Supervised Clustering Methods for the Analysis of DNA Microarray Expression Data

Authors:	XIAO Jing WANG Xue-feng YANG Ze-feng XU Chen-wu

Institution:	1. Jiangsu Provincial Key Laboratory of Crop Genetics and Physiology,Yangzhou University,Yangzhou 225009,P.R.China;The School of Public Health,Nantong University,Nantong 226001,P.R.China 2. Jiangsu Provincial Key Laboratory of Crop Genetics and Physiology,Yangzhou University,Yangzhou 225009,P.R.China

Abstract:	Several typical supervised clustering methods such as Gaussian mixture model-based supervised clustering(GMM),knearest-neighbor(KNN),binary support vector machines(SVMs)and multiclass support vector machines(MC-SVMs)were employed to classify the computer simulation data and two real microarray expression datasets.False positive,false negative,true positive,true negative,clustering accuracy and Matthews' correlation coefficient (MCC) were compared among these methods.The results are as follows:(1)In classifying thousands of gene expression data,the performances of two GMM methods have the maximal clustering accuracy and the least overall FP FN error numbers on the basis of the assumption that the whole set of microarray data are a finite mixture of multivariate Gaussian distributions.Furthermore,when the number of training sample is very small,the clustering accuracy of GMM-II method has superiority over GMMI method.(2)In general,the superior classification performance of the MC-SVMs are more robust and more practical,which are less sensitive to the curse of dimensionality,and not only next to GMM method in clustering accuracy to thousands of gene expression data,but also more robust to a small number of high-dimensional gene expression samples than other techniques.(3)Of the MC-SVMs,OVO and DAGSVM perform better on the large sample sizes,whereas five MC-SVMs methods have very similar performance on moderate sample sizes.In other cases,OVR,WW and CS yield better results when sample sizes are small.So,it is recommended that at least two candidate methods,choosing on the basis of the real data features and experimental conditions,should be performed and compared to obtain better clustering result.

Keywords:	microarray supervised clustering k-nearest-neighbor(KNN) support vector machines (SVMs)
本文献已被维普万方数据 ScienceDirect 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏