Partitioning-based clustering for Web document categorization
作者: Daniel BoleyMaria GiniRobert GrossEui-Hong (Sam) HanKyle HastingsGeorge KarypisVipin KumarBamshad MobasherJerome Moore
作者单位: 1Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN 55455,USA
刊名: Decision Support Systems, 1999, Vol.27 (3), pp.329-341
来源数据库: Elsevier Journal
DOI: 10.1016/S0167-9236(99)00055-X
关键词: ClusteringCategorizationWorld Wide Web documentsGraph partitioningAssociation rulesPrincipal component analysis
原始语种摘要: Abstract(#br)Clustering techniques have been used by many intelligent software agents in order to retrieve, filter, and categorize documents available on the World Wide Web. Clustering is also useful in extracting salient features of related Web documents to automatically formulate queries and search for other similar documents on the Web. Traditional clustering algorithms either use a priori knowledge of document structures to define a distance or similarity among these documents, or use probabilistic techniques such as Bayesian classification. Many of these traditional algorithms, however, falter when the dimensionality of the feature space becomes high relative to the size of the document space. In this paper, we introduce two new clustering algorithms that can effectively cluster...
全文获取路径: Elsevier  (合作)
影响因子:2.201 (2012)

  • clustering 聚类
  • categorization 归类
  • Web 
  • document 文件
  • partitioning 分块
  • feature 结构元件
  • dimensionality 量纲
  • hierarchical 分级
  • automatically 自动地
  • probabilistic 概率的