杨美姣,刘惊雷.基于Nystrm采样和凸NMF的偏好聚类[J].计算机科学,2018,45(1):55-61, 78
基于Nystrm采样和凸NMF的偏好聚类
Preference Clustering Based on Nystrm Sampling and Convex-NMF
投稿时间:2017-03-03  修订日期:2017-06-24
DOI:10.11896/j.issn.1002-137X.2018.01.008
中文关键词:  Nystrm方法,凸的非负矩阵分解,偏好聚类,聚类中心,聚类指示器
英文关键词:Nystrm method,Convex non-negative matrix factorization,Preference clustering,Clustering center,Clustering indicator
基金项目:本文受国家自然科学基金(61572419,8,61403328,9),山东省自然科学基金(ZR2014FQ016,ZR2014FQ026,5GSF115009,ZR2013FM011)资助
作者单位E-mail
杨美姣 烟台大学计算机与控制工程学院 山东 烟台264005  
刘惊雷 烟台大学计算机与控制工程学院 山东 烟台264005 jinglei_liu@sina.com 
摘要点击次数: 295
全文下载次数: 198
中文摘要:
      大规模的稀疏图数据在现实中大量出现,例如协同图、拉普拉斯矩阵等。非负矩阵分解(NMF)已经成为数据挖掘、信息检索和信号处理的一个非常重要的工具。随着数据量的不断增大,如何实现大规模数据的偏好聚类是一个重要的问题。采用两阶段的方法来实现大规模的偏好聚类,即首先利用Nystrm的近似采样方法,从大数据上获得数据的初始轮廓,获得部分用户-用户相似矩阵或电影-电影相似矩阵,从而可以将原始的高维空间降低到一个低维子空间;然后通过对低维相似矩阵进行凸的非负矩阵分解,从而得到聚类的中心和指示器,聚类的中心表示电影或用户的特征,指示器表示用户或电影特征的权重。该两阶段偏好聚类方法的优点是,初始数据轮廓的近似获取以及凸的非负矩阵分解,使得该方法具有较好的鲁棒性和抗噪性;另外,子空间的数据来源于真实的矩阵行列数据,使得偏好聚类结果具有良好的可解释性。采用Nystrm方法解决了大规模的数据无法在内存中存储的问题,从而大大节省了内存,提高了运行效率。最后在含有100000条电影的数据集上进行偏好聚类,结果表明了该聚类算法的有效性。
英文摘要:
      Large-scale sparse graph data appear in reality heavily,for example,collaborative graph,Laplacian matrix,and so on.Non-negative matrix factorization (NMF) has become a useful tool in data mining,information retrieval and signal processing.How to achieve the data clustering in large-scale data is an important issue.This paper used the two-stage method to realize data clustering.First of all,the Nystrm approximate sampling method is used.The initial profile of data is obtained from large data,and the similar matrix of user-user or movie-movie is obtained.The purpose of doing that is to reduce the original high dimensional space to a low dimensional subspace.Then convex non-negative matrix decomposition of low dimensional similarity matrix is used to get the center of the cluster and indicator.The center of the cluster represents the features of movies or users,and the indicator represents the weight of the features of mo-vies or users.The advantage of two-stage preference clustering method is that the approximation of initial data contour and convex non-negative matrix factorization have better robustness and anti-noise.On the other hand,the data of the subspace are derived from the real matrix,which makes the results of clustering preference have good interpretability.This paper utilized Nystrm method to solve the problem that large-scale data cannot be stored in the memory,saving memory and improving operation efficiency.Lastly,the test on movie data sets which containing 100000 ratings shows the effectiveness of the clustering algorithm.
查看全文  查看/发表评论  下载PDF阅读器