Variable Selection and Outlier Detection for Automated K-means Clustering
作者: Sung-Soo Kim
刊名: Communications for Statistical Applications and Methods, 2015, Vol.22 (1)
来源数据库: Communications for Statistical Applications and Methods
DOI: 10.5351/CSAM.2015.22.1.055
关键词: Automated K-means clusteringvariable selectionoutlier detectingVS-KMadjusted rand indexMahalanobis distance.
原始语种摘要: An important problem in cluster analysis is the selection of variables that define cluster structure that also eliminate noisy variables that mask cluster structure; in addition, outlier detection is a fundamental task for cluster analysis. Here we provide an automated K-means clustering process combined with variable selection and outlier identification. The Automated K-means clustering procedure consists of three processes: (i) automatically calculating the cluster number and initial cluster center whenever a new variable is added, (ii) identifying outliers for each cluster depending on used variables, (iii) selecting variables defining cluster structure in a forward manner. To select variables, we applied VS-KM (variable-selection heuristic for K-means clustering) procedure (Brusco and...
全文获取路径: PDF下载  CSAM 

  • cluster 
  • variable 变量
  • selection 选择
  • means 手段
  • approach 
  • outlier 老围层
  • heuristic 试探
  • identify 视为同一
  • problem 题目
  • procedure 手续