王占兵,宋伟,彭智勇,杨先娣,崔一辉,申远.一种面向密文基因数据的子序列外包查询方法[J].计算机科学,2018,45(6):51-56
一种面向密文基因数据的子序列外包查询方法
Subsequence Outsourcing Query Method over Encrypted Genomic Data
投稿时间:2017-03-11  修订日期:2017-07-04
DOI:10.11896/j.issn.1002-137X.2018.06.009
中文关键词:  精准医疗,子序列检索,密文查询,全文检索
英文关键词:Precision medicine,Subsequence query,Ciphertext query,Full-text query
基金项目:本文受国家自然科学基金(61232002,8)资助
作者单位E-mail
王占兵 武汉大学计算机学院 武汉430070 bingo711x@whu.edu.cn 
宋伟 武汉大学计算机学院 武汉430070  
彭智勇 武汉大学计算机学院 武汉430070 peng@whu.edu.cn 
杨先娣 武汉大学计算机学院 武汉430070 xiandiy@whu.edu.cn 
崔一辉 武汉大学计算机学院 武汉430070 cuiyihui@whu.edu.cn 
申远 武汉大学计算机学院 武汉430070  
摘要点击次数: 278
全文下载次数: 197
中文摘要:
      精准医疗是一种强烈依赖病人基因组分析结果的医疗模式,而子串检索是执行基因组分析的重要方法。近年来,基因数据的数据量急剧增长,其存储代价和处理复杂度已远超医疗方可承受的范围。于是, 利用云服务提供商廉价的存储设备和强大的计算能力,将基因数据托管至云服务提供商成为切实可行的解决方案。考虑到云服务提供商并不完全可信,在 数据上传至云端之前执行数据加密是保证数据安全性和隐私性的有效方法。然而,如何基于加密数据执行序列检索成为亟待解决的问题。针对这一问题,对基因数据处理和密文检索领域进行调研,提出 采用q-gram技术对序列数据的定长窗口创建前缀签名的方案,并在执行查询时在每个窗口中完成前缀查询的解决方案。在子序列查询过程中,云端并不能获取用户数据明文。最后通过实验验证了所提方案具有较好的性能和存储开销,例如当窗口大小为100且q取6时,对100000长序列串执行构建索引耗时15.06 s。与GPSE相比,所提方法的性能更优。
英文摘要:
      Precision medicine is a medical model that relies heavily on patient genome analysis.The subsequence search plays an important role in performing genome analysis.Recently,the amount of genomic data are increasing dramatically,and the storage cost and processing complexity of them have been far beyond the capacity of hospitals.So,utilizing the powerful cloud computing capability to analyze and process such massive genomic sequence data is becoming popular.Considering that cloud service provider is not completely trusted,encrypting genomic data before uploading is a straightforward and effective solution to guarantee the privacy and security of DNA sequence data.However,how to perform queries over the encrypted genomic sequence data becomes another difficult problem.To address this problem,this paper made a detailed survey on genomic data processing and full-text retrieval fields.It constructed indexes on fix-length windows of the genomic sequence using q-gram mapping,and performed queries in every window.If the query sequence is the prefix of any window in genomic sequence,the query hits.Throughout all the processes,cloud service provider stores indexes and performs subsequence query,without obtaining any privacy details.Moreover,this paper set up the system model and several security assumptions,and proved their security.Experiments were carried out to evaluate the performance of scheme on a public dataset.The results show that the proposed solution achieves better performance in time cost and storage cost,i.e.when w is 100 and q is 6,the building index algorithm costs 15.60s for sequence of 100000 length.Compared with GPSE,the proposed solution has higher execution efficiency in performing queries.
查看全文  查看/发表评论  下载PDF阅读器