董红斌,石丽,李涛.一种改进的microRNA预测模型集成方法[J].计算机科学,2018,45(2):69-75
一种改进的microRNA预测模型集成方法
Improved Ensemble Method on MicroRNA Prediction Model
投稿时间:2017-05-10  修订日期:2017-06-28
DOI:10.11896/j.issn.1002-137X.2018.02.012
中文关键词:  microRNA,预测,采样,特征选择,类不平衡
英文关键词:MicroRNA,Prediction,Sampling,Feature selection,Imbalance class
基金项目:本文受国家自然科学基金项目(61472095)资助
作者单位E-mail
董红斌 哈尔滨工程大学计算机科学与技术学院 哈尔滨150001 donghongbin@hrbeu.edu.cn 
石丽 哈尔滨工程大学计算机科学与技术学院 哈尔滨150001  
李涛 哈尔滨工程大学计算机科学与技术学院 哈尔滨150001  
摘要点击次数: 423
全文下载次数: 247
中文摘要:
      现有的microRNA预测方法往往存在数据集类不平衡和适用物种单一的问题。针对以上问题,所做主要工作如下:1)提出基于序列熵的分层采样算法,该算法可在保持样本总体分布的基础上,采样生成正样本和负样本数量平衡的训练集;2)提出基于信噪比和相关性的特征选择,用于缩小训练集规模,以达到提高训练速度的目的;3)提出DS-GA算法,用于缩短SVM分类器参数的优化时间,达到减少过拟合的目的;4)结合集成学习的思想,经采样、特征选择、分类器参数优化3个步骤,建立了一种物种间通用的microRNA预测模型。实验表明,该模型有效解决了类不平衡问题,且不局限于单一物种,对混合物种的测试集预测取得了较好效果。
英文摘要:
      The existing microRNA prediction methods often present the problems of imbalance data set class and single applicable species.In order to solve the above problems,the main work is as follows.Firstly,a hierarchical sampling algorithm based on sequence entropy was proposed,which can generate a training set enhancing balance positive and negative samples based on the overall distribution of the samples.Secondly,a feature selection algorithm based on signal-to-noise ratio and correlation was designed to reduce the scale of training set and achieve the purpose of improving training speed.Thirdly,the DS-GA was proposed to shorten the optimization time of SVM classifier parameters and avoid the over-fitting problem.At last,based on the idea of ensemble learning,a common microRNA prediction model was established by sampling,feature selection and classifier parameter optimization.Experiments show that the model solves the problem of imbalance effectively,it is not limited to a single species and achieves better results for the hybrid species test set prediction.
查看全文  查看/发表评论  下载PDF阅读器