陈福才,李思豪,张建朋,黄瑞阳.基于标签关系改进的多标签特征选择算法[J].计算机科学,2018,45(6):228-234
基于标签关系改进的多标签特征选择算法
Multi-label Feature Selection Algorithm Based on Improved Label Correlation
投稿时间:2017-04-25  修订日期:2017-07-29
DOI:10.11896/j.issn.1002-137X.2018.06.041
中文关键词:  多标签特征选择,标签关系,依赖度,冗余度,特征评分
英文关键词:Multi-label feature selection,Label correlation,Dependency,Redundancy,Feature score
基金项目:本文受国家重点研发计划项目(2016YFB0800101),国家自然科学基金创新研究群体项目(61521003)资助
作者单位E-mail
陈福才 国家数字交换系统工程技术研究中心 郑州450002 1242100831@qq.com 
李思豪 国家数字交换系统工程技术研究中心 郑州450002 michaelbournelisihao@outlook.com 
张建朋 国家数字交换系统工程技术研究中心 郑州450002  
黄瑞阳 国家数字交换系统工程技术研究中心 郑州450002  
摘要点击次数: 250
全文下载次数: 198
中文摘要:
      多标签特征选择是应对数据维度灾难现象的主要方法之一,可以在降低特征维度的同时提高学习效率,优化分类性能。针对目前特征选择算法没有考虑标签间的相互关系,以及信息量的衡量范围存在偏差的问题,提出一种基于标签关系改进的多标签特征选择算法。首先引入对称不确定性对信息量进行归一化处理,然后用归一化的互信息量作为相关性的衡量方法,并据此定义标签的重要性权重,对依赖度和冗余度中的标签相关项进行加权处理;进而提出一种特征评分函数作为特征重要性的评价指标,并依次选择出评分最高的特征组成最佳特征子集。实验结果表明,与其他算法相比,该算法在提取出更加精确的低维特征子集后,不仅能够有效提高面向实体信息挖掘的多标签学习算法的性能,也能提高基于离散特征的多标签学习算法的效率。
英文摘要:
      Multi-label feature selection is one of the essential methods to overcome the curse of dimensionality.It reduces the feature dimension,improves the learning efficiency,and optimizes the classification performance.However,many existing feature selection algorithms hardly take label correlation into consideration,and the range of information entropies are biased within different data sets.To address those problems,this paper proposed a multi-label feature selection algorithm based on the improved label correlation.The algorithm firstly uses symmetrical uncertainty to norma-lize the information entropy,and takes normalized mutual information as relationship measurement to define the label importance,with which the label-related items in dependency and redundancy are weighted.In the end,the score function is put forward to evaluate the feature importance,and the best feature subset is selected with the highest score.Experiments demonstrate that after selecting out the concise and accurate feature subset,the multi-label classification is accelerated in terms of the performance and the efficiency with disperse features.
查看全文  查看/发表评论  下载PDF阅读器