DBSCAN 聚类算法的研究与改进
An Improved DBSCAN Clustering Algorithm
Abstract
摘要: 针对 “基于密度的带有噪声的空间聚类” (DBSCAN)算法存在的不足 ,提出 “分而治之” 和
高效的并行方法对DBSCAN 算法进行改进.通过对数据进行划分,利用 “分而治之” 思想减少全
局变量 Ep s值的影响;利用并行处理方法和降维技术提高聚类效率 ,降低 DBSCAN 算法对内存
的较高要求;采用增量式处理方式解决数据对象的增加和删除对聚类的影响.结果表明:新方法
有效地解决了DBSCAN 算法存在的问题 ,其聚类效率和聚类效果明显优于传统 DBSCAN 聚类
算法
Abstract : An improved density based spatial clustering of applications with noise (DBSCAN) algorit hm , which can considerably improve cluster quality , is proposed. The algorithm is based on two ideas : dividing and ruling , and ; high performance parallel methods. The idea of dividing and ruling was used to reduce the effect of the global variable Eps by data partition. Parallel processing methods and the technique of reducing dimensionality were used to improve the efficiency of clustering and to reduce the large memory space requirements of the DBSCAN algorithm. Finally , an incremental processing method was applied to determine t he influence on clustering of inserting or deleting data objects. The results show that an implementation of the new met hod solves existing problems treated by the DBSCAN algorithm : Both the efficiency
and the cluster quality are better than for the original DBSCAN algorithm.