王颖,杨余旺.基于堆和邻域共存信息的KNN相似图算法[J].计算机科学,2018,45(5):196-200, 227
基于堆和邻域共存信息的KNN相似图算法
KNN Similarity Graph Algorithm Based on Heap and Neighborhood Coexistence
投稿时间:2017-02-24  修订日期:2017-05-05
DOI:10.11896/j.issn.1002-137X.2018.05.033
中文关键词:  谱聚类,相似图,堆,稀疏K近邻
英文关键词:Spectral clustering,Similarity graph,Heap,Sparse k nearest neighborhood
基金项目:本文受江苏省科技支撑计划(BE2012386,BE2011342),江苏省农业自主创新项目(CX(13)3054,CX(16)1006),江苏省普通高校研究生创新计划项目(KYLX16_0464)资助
作者单位E-mail
王颖 南京理工大学计算机科学与工程学院 南京210094  
杨余旺 南京理工大学计算机科学与工程学院 南京210094 yuwangyang@njust.edu.cn 
摘要点击次数: 287
全文下载次数: 214
中文摘要:
      在谱聚类算法中,相似图的构造至关重要,对整个算法的聚类结果和运行效率都有着巨大影响。为了加快谱聚类的运算速度和通过近邻截断提高其性能,通常选择K近邻(KNN)方法来构造稀疏的相似图,而K近邻图对离群点非常敏感,这种噪声边会严重影响聚类算法的性能。文中提出了一种新的高效稀疏亲和图构造方法HCKNN,其中基于堆的K近邻搜索比基于排序的近邻选择在效率方面提升了log(n),基于邻域共存累计的阈值化来进行邻域约减不仅能够去除噪声边以提高聚类性能,还能进一步稀疏化相似矩阵,从而加速谱聚类中的特征分解。
英文摘要:
      In the spectral clustering algorithm,the construction of similarity graph is very important,which has a great impact on the clustering results and operation efficiency of algorithm.In order to speed up the operation of spectral clustering and improve the performance by nearest neighbor truncation,K nearest neighbor(KNN) method is usually used to construct the sparse similarity graph.But the K nearest neighbor graph is very sensitive to outliers in the data,and the noise will seriously affect the clustering performance.This paper presented a new efficient sparse affinity graph construction method HCKNN.In this method,the K nearest neighbor search based on heap is more efficient than the nearest neighbor selection process based on sort by log(n),and the sparse similarity matrix reduction based on the neighborhood coexistence cumulative threshold can not only remove the noise to enhance the performance of clustering,but also accelerate the eigenvector decomposition in spectral clustering.This paper proposed a new efficient method HCKNN for constructing sparse affinity graph based on heap and the information of two points in the same neighborhood,which can not only choose the neighbors more efficiently,but also can accelerate the spectral clustering because of the sparse matrix.
查看全文  查看/发表评论  下载PDF阅读器