[1]陈春玲*,姜慧敏,郭永安.基于两阶段特征选择的医疗敏感文本分类[J].计算机技术与发展,2020,30(08):129-133.[doi:10. 3969 / j. issn. 1673-629X. 2020. 08. 022]
 CHEN Chun-ling*,JIANG Hui-min,GUO Yong-an.Medical Sensitive Text Classification Based on Two-stage Feature Selection[J].,2020,30(08):129-133.[doi:10. 3969 / j. issn. 1673-629X. 2020. 08. 022]
点击复制

基于两阶段特征选择的医疗敏感文本分类()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
30
期数:
2020年08期
页码:
129-133
栏目:
应用开发研究
出版日期:
2020-08-10

文章信息/Info

Title:
Medical Sensitive Text Classification Based on Two-stage Feature Selection
文章编号:
1673-629X(2020)08-0129-05
作者:
陈春玲*1姜慧敏1郭永安2
1. 南京邮电大学 计算机学院、软件学院,江苏 南京 210023; 2. 南京邮电大学 通信与信息工程学院,江苏 南京 210003
Author(s):
CHEN Chun-ling*1JIANG Hui-min1GUO Yong-an2
1. School of Computer,Nanjing University of Posts and Telecommunications,Nanjing 210023,China; 2. School of Telecommunications & Information Engineering,Nanjing University of Posts and Telecommunications,Nanjing 210003,China
关键词:
医疗数据隐私保护特征选择敏感数据文本分类
Keywords:
medical big dataprivacy protectionfeature selectionsensitive datatext classification
分类号:
TP18
DOI:
10. 3969 / j. issn. 1673-629X. 2020. 08. 022
摘要:
为完成对医疗数据的敏感性分类,采用文本分类技术从对医疗敏感数据的分类的角度对医疗信息隐私保护进行了研究。 在传统的医疗文本分类基础上,提出基于 LSI-TF-IDF 两阶段特征选择的文本分类方法对医疗文本数据进行敏感性分类。 分别采用基于 TF-IDF 的传统文本分类方法和基于 LSI-TF-IDF 的两阶段特征选择的文本分类方法对糖尿病文本数据进行敏感性分类,利用朴素贝叶斯、KNN、SVM 三个分类器进行实验比较,采用准确率、召回率和 F1值作为评价标准。 实验结果表明,基于 LSI-TF-IDF 两阶段特征选择的文本分类方法较之基于 TF-IDF 的传统文本分类方法在准确率、召回率和 F1值上均有所提升。 证明了该方法在医疗文本数据的敏感性分类上具有更好的分类效果。
Abstract:
In order to complete the sensitive classification of medical data, text classification technology is used to study the privacy protection of medical information from the perspective of classification of medical sensitive data. Based on the traditional medical text classification,a text classification method based on LSI-TF-IDF two-stage feature selection is proposed to classify medical text data. The experiment uses the traditional text class-ification method based on TF-IDF and the text classification method based on LSI-TF-IDF two-stage feature selection to classify the sensitivity of diabetes text data,with three types of naive Bayes,KNN and SVM. For the comparison of experiments,precision ratio, recall ratio and F1 value are used as evaluation criteria. The experiment shows that the text classification method based on LSI-TF-IDF two- stage feature selection has impr-oved in precision ratio,recall ratio and F1 value compared with traditional text classification method based on TF-IDF. It is proved that the proposed method has better classification effect on the sensitivity classification of medical text data.

相似文献/References:

[1]耿波 仲红 徐杰 闫娜娜.隐私保护的关联规则挖掘研究[J].计算机技术与发展,2009,(02):81.
 GENG Bo,ZHONG Hong,XU Jie,et al.Study on Privacy- Preserving Association Rule Mining[J].,2009,(08):81.
[2]张瑞 郑诚 陈娟娟.关联规则挖掘中的隐私保护研究[J].计算机技术与发展,2008,(10):13.
 ZHANG Rui,ZHENG Cheng,CFIEN Juan-juan.Research on Privacy Preserving in Association Rules Mining[J].,2008,(08):13.
[3]李玲娟 郑少飞.基于数据处理的数据挖掘隐私保护技术分析[J].计算机技术与发展,2011,(03):94.
 LI Ling-juan,ZHENG Shao-fei.Analysis of Data Mining Privacy Preserving Technology Based on Data Processing[J].,2011,(08):94.
[4]杜定宇 王茜.一种基于中间代理的个性化推荐系统[J].计算机技术与发展,2011,(09):66.
 DU Ding-yu,WANG Qian.An Agent-Based Personalized Recommendation System[J].,2011,(08):66.
[5]杨宸 薛丹 周健.带空间特性的访问控制在汽车终端上的应用[J].计算机技术与发展,2012,(01):225.
 YANG Chen,XUE Dan,ZHOU Jian.Application of Role-Based Access Control with Spatial Character in Automobile Terminal[J].,2012,(08):225.
[6]韦琳 许峰.医疗网格隐私保护技术研究[J].计算机技术与发展,2012,(05):254.
 WEI Lin,XU Feng.Privacy Preservation in Heaithgrid:A Survey[J].,2012,(08):254.
[7]赵丹 杨庚.一种基于复数域的数据融合完整性保护算法[J].计算机技术与发展,2012,(08):150.
 ZHAO Dan,YANG Geng.A Complex Field-based Integrity-protecting Data Aggregation Algorithm[J].,2012,(08):150.
[8]吴婷婷,李玲娟.面向 RFID 的位置隐私保护算法研究[J].计算机技术与发展,2013,(01):157.
 WU Ting-ting,LI Ling-juan.Study on RFID-oriented Location Privacy Protection Algorithm[J].,2013,(08):157.
[9]梁庆庆,杨庚.一种低通信量的数据融合隐私保护算法[J].计算机技术与发展,2013,(08):133.
 LIANG Qing-qing,YANG Geng.A Low_traffic Privacy-preserving Aggregation Algorithm[J].,2013,(08):133.
[10]张燕,曹晓梅.基于隐私保护的非线性安全数据融合方案[J].计算机技术与发展,2013,(09):114.
 ZHANG Yan,CAO Xiao-mei.Nonlinear Secure Data Aggregation Scheme Based on Privacy Protection[J].,2013,(08):114.

更新日期/Last Update: 2020-08-10