[1]郑承宇,王 新,王 婷,等.基于迁移学习和集成学习的医疗文本分类[J].计算机技术与发展,2022,32(04):28-33.[doi:10. 3969 / j. issn. 1673-629X. 2022. 04. 005]
 ZHENG Cheng-yu,WANG Xin,WANG Ting,et al.Medical Text Classification Based on Transfer Learning and Ensemble Learning[J].,2022,32(04):28-33.[doi:10. 3969 / j. issn. 1673-629X. 2022. 04. 005]
点击复制

基于迁移学习和集成学习的医疗文本分类()

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
32
期数:
2022年04期
页码:
28-33
栏目:
大数据分析与挖掘
出版日期:
2022-04-10

文章信息/Info

Title:
Medical Text Classification Based on Transfer Learning and Ensemble Learning
文章编号:
1673-629X(2022)04-0028-06
作者:
郑承宇王 新王 婷徐权峰
云南民族大学 数学与计算机科学学院,云南 昆明 650500
Author(s):
ZHENG Cheng-yuWANG XinWANG TingXU Quan-feng
School of Mathematics and Computer Science,Yunnan Minzu University,Kunming 650500,China
关键词:
迁移学习集成学习ALBERTBi-LSTM-CNN医疗文本健康问句
Keywords:
transfer learningensemble learningALBERTBi-LSTM-CNNmedical texthealth question
分类号:
TP391
DOI:
10. 3969 / j. issn. 1673-629X. 2022. 04. 005
摘要:
针对医疗文本语义稀疏、维度过高的问题,提出一种基于迁移学习和集成学习的多标签医疗文本分类算法( Trans-LSTM-CNN-Multi,TLCM) 。 该算法采用 ALBERT( A Lite BERT) 模型内部的多层双向 Transfomer 结构对大型语料库展开训练,获取通用领域的文本动态字向量表示。 然后,利用医学领域目标数据集通过迁移学习和模型微调技术实现ALBERT 预训练语言模型在医学领域的文本语义增强。 在此基础上,将上述通过迁移学习得到的文本语义增强模型输入到 Bi-LSTM-CNN 集成学习模块,进一步提取医学文本内容的重要信息特征。 最后,基于二元交叉熵损失函数构造文本多标签分类器实现医疗文本分类。 实验结果表明,通过迁移学习和集成学习的 TLCM 文本分类算法能有效提升医疗文本的分类性能,在中文健康问句数据集上整体 F1 值达到了 91. 8% 。
Abstract:
Aiming at the problems of sparse semantic and high dimension of medical text, a multi - label medical text classification algorithm based on transfer learning and ensemble learning named TLCM ( Trans-LSTM-CNN-Multi) is proposed. Firstly,the large-scale corpus is trained through the multi-layer Transfomer structure inside the ALBERT ( A Lite BERT) model to obtain the dynamic word vector representation of the text. Then,the target data set in the medical field is used to realize the text semantic enhancement in the medical field through transfer learning and model fine - tuning technology based on ALBERT ( A Lite BERT) pre - training language model. On this basis,the above-mentioned semantic enhancement model obtained through transfer learning is input to the Bi-LSTM-CNN ensemble learning module to further extract important information characteristics of medical text content. Finally,a text multi-label classifier based on binary cross entropy loss function is constructed to achieve medical text classification. The experimental results show that the text classification algorithm through transfer learning and ensemble learning can effectively improve the overall performance of the model,and finally the overall F1 value on the Chinese health question data set reaches 91. 8% .

相似文献/References:

[1]陈全 赵文辉 李洁 江雨燕.选择性集成学习算法的研究[J].计算机技术与发展,2010,(02):87.
 CHEN Quan,ZHAO Wen-hui,LI Jie,et al.Research of Selective Ensemble Learning Algorithm[J].,2010,(04):87.
[2]贾瑞玉 冯伦阔 李永顺 张新建.基于集成学习的覆盖算法[J].计算机技术与发展,2009,(07):76.
 JIA Rui-yu,FENG Lun-kuo,LI Yong-shun,et al.Cover Algorithm Based on Ensemble Learning[J].,2009,(04):76.
[3]姚明海,赵连朋,刘维学.基于特征选择的Bagging分类算法研究[J].计算机技术与发展,2014,24(04):103.
 YAO Ming-hai,ZHAO Lian-peng,LIU Wei-xue.Research on Bagging Classification Algorithm Based on Feature Selection[J].,2014,24(04):103.
[4]李 勇,刘战东,张海军.跨项目软件缺陷预测方法研究综述[J].计算机技术与发展,2020,30(03):98.[doi:10. 3969 / j. issn. 1673-629X. 2020. 03. 019]
 LI Yong,LIU Zhan-dong,ZHANG Hai-jun.Review on Cross-project Software Defects Prediction Methods[J].,2020,30(04):98.[doi:10. 3969 / j. issn. 1673-629X. 2020. 03. 019]
[5]周丰,王未央. 基于最小最大模块化集成特征选择的改进[J].计算机技术与发展,2016,26(09):149.
 ZHOU Feng,WANG Wei-yang. Improvement of Multi-classification Integrated Selection Based on Min-Max-Module[J].,2016,26(04):149.
[6]武苏雯,赵慧杰,刘 鑫,等.基于迁移学习的图像分类在诗词中的应用研究[J].计算机技术与发展,2021,31(07):215.[doi:10. 3969 / j. issn. 1673-629X. 2021. 07. 036]
 WU Su-wen,ZHAO Hui-jie,LIU Xin,et al.Research on Application of Image Classification Based onTransfer Learning in Poetry[J].,2021,31(04):215.[doi:10. 3969 / j. issn. 1673-629X. 2021. 07. 036]
[7]娄丰鹏,吴迪,荆晓远,等.增加度量元的迁移学习跨项目软件缺陷预测[J].计算机技术与发展,2018,28(07):103.[doi:10.3969/ j. issn.1673-629X.2018.07.022]
 LOU Feng-peng,WU Di,JING Xiao-yuan,et al.Cross-project Software Defect Prediction Based on Transfer Learning with Metrics[J].,2018,28(04):103.[doi:10.3969/ j. issn.1673-629X.2018.07.022]
[8]刘宇廷,倪颖杰.融合知识迁移学习的微博社团检测模型构建[J].计算机技术与发展,2018,28(09):11.[doi:10.3969/j.issn.1673-629X.2018.09.003]
 LIU Yu-ting,NI Ying-jie.Construction of Weibo Community Detection Model with Knowledge Transfer Learning[J].,2018,28(04):11.[doi:10.3969/j.issn.1673-629X.2018.09.003]
[9]黄 琳,荆晓远,董西伟.基于多核集成学习的跨项目软件缺陷预测[J].计算机技术与发展,2019,29(06):27.[doi:10. 3969 / j. issn. 1673-629X. 2019. 06. 006]
 HUANG Lin,JING Xiao-yuan,DONG Xi-wei.Cross-project Software Defect Prediction Based on Multiple Kernel Ensemble Learning[J].,2019,29(04):27.[doi:10. 3969 / j. issn. 1673-629X. 2019. 06. 006]
[10]王泽泓,刘厚泉.基于迁移学习与自适应特征融合的建筑物识别[J].计算机技术与发展,2019,29(12):40.[doi:10. 3969 / j. issn. 1673-629X. 2019. 12. 007]
 WANG Ze-hong,LIU Hou-quan.Building Recognition Based on Transfer Learning and Adaptive Feature Fusion[J].,2019,29(04):40.[doi:10. 3969 / j. issn. 1673-629X. 2019. 12. 007]

更新日期/Last Update: 2022-04-10