[1]李正辉,廖光忠.基于多层次特征提取的中文医疗实体识别[J].计算机技术与发展,2023,33(09):119-125.[doi:10. 3969 / j. issn. 1673-629X. 2023. 09. 018]
 LI Zheng-hui,LIAO Guang-zhong.Chinese Medical Entity Recognition Based on Multi-level Feature Extraction[J].,2023,33(09):119-125.[doi:10. 3969 / j. issn. 1673-629X. 2023. 09. 018]
点击复制

基于多层次特征提取的中文医疗实体识别()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
33
期数:
2023年09期
页码:
119-125
栏目:
人工智能
出版日期:
2023-09-10

文章信息/Info

Title:
Chinese Medical Entity Recognition Based on Multi-level Feature Extraction
文章编号:
1673-629X(2023)09-0119-07
作者:
李正辉1 廖光忠2
1. 武汉科技大学 计算机科学与技术学院,湖北 武汉 430065;
2. 武汉科技大学 智能信息处理与实时工业系统湖北省重点实验室,湖北 武汉 430065
Author(s):
LI Zheng-hui1 LIAO Guang-zhong2
1. School of Computer Science and Technology,Wuhan University of Science and Technology,Wuhan 430065,China;
2. Hubei Province Key Laboratory of Intelligent Information Processing and Real-time Industrial System,Wuhan University of Science and Technology,Wuhan 430065,China
关键词:
实体识别BERT 预训练空洞卷积网络注意力机制感受野
Keywords:
entity recognitionBERT pre-trainingIDCNNmechanism of attentionreceptive field
分类号:
TP391
DOI:
10. 3969 / j. issn. 1673-629X. 2023. 09. 018
摘要:
中文医疗实体识别是医疗领域文本信息处理的基础,但中文医疗文本中常常存在语法不规范、实体嵌套和类型易混淆等问题易造成实体识别精度下降,因此确保中文医疗
实体识别的准确度具有较大的理论研究和实际应用价值。为此,提出一种融合 BERT 预训练、双向长短期记忆网络( BILSTM) 和结合注意力机制的空洞卷积网络( IDCNN) 的实体识别模型来提升中文医疗实体识别的精度。 起先,使用 BERT 预训练语言模型使中文字符转换为词向量并增强其语法语义特征;而后将训练好的词向量分别通过 BILSTM 网络和加入注意力机制的 IDCNN 网络获取上下文信息和更大的感受野;最终将蕴含语法语义特征、上下文信息和更大的感受野信息的特征融合并输入到条件随机场( CRF) 中进行实体预测。 在两个公开的医疗数据集 CMeEE / Yidu-S4K 上的实验表明,该模型的 F1 值分别达到了 0. 711 6 和 0. 820 6,较主流模型分别提高了 1. 40 百分点和 2. 29 百分点,验证了此模型在中文医疗实体识别上的有效性。
Abstract:
Chinese medical entity recognition is the basis of text information processing in the medical field, but there are oftengrammatical irregularities,entity nesting and type confusion in Chinese medical texts that may cause the decrease of entity recognition accuracy,so it is of great theoretical research and practical application value to ensure the accuracy of Chinese medical entity recognition.To this end,we propose an entity recognition model that combines BERT pre - training, bi - directional long and short - term memorynetwork ( BILSTM) and IDCNN with attention mechanism to improve the accuracy of Chinese medical entity recognition. At first,theBERT pre-trained language model is used to convert Chinese characters into word vectors and enhance their grammatical - semanticfeatures. The trained word vectors are then passed through the BILSTM network and the IDCNN network with attention mechanism toobtain contextual information and a larger perceptual field,respectively. Finally,the features containing grammatical-semantic features,contextual information and a larger perceptual field are fused and fed into the conditional random field ( CRF) for entity recognition. Thefeatures containing syntactic semantic features,contextual information and larger receptive field information are finally fused and fed intoconditional randomization ( CRF) for entity prediction. Experiments on two publicly available medical datasets,CMeEE / Yidu - S4K,showed that the F1 values of the model reached 0. 711 6 and 0. 820 6 respectively,which were 1. 40 and 2. 29 percentage points higherthan that of the mainstream models,validating the effectiveness of this model for Chinese medical entity recognition.

相似文献/References:

[1]赵震,张龙昌. XML文档实体识别技术研究[J].计算机技术与发展,2014,24(10):84.
 ZHAO Zhen,ZHANG Long-chang. Research on Entity Identification Technology on XML Documents[J].,2014,24(09):84.
[2]赵君珂,张振宇,蔡开裕.基于自然语言处理的医学实体识别与标签提取[J].计算机技术与发展,2019,29(09):18.[doi:10. 3969 / j. issn. 1673-629X. 2019. 09. 004]
 ZHAO Jun-ke,ZHANG Zhen-yu,CAI Kai-yu.Medical Entity Recognition and Label Extraction Based on Natural Language Processing[J].,2019,29(09):18.[doi:10. 3969 / j. issn. 1673-629X. 2019. 09. 004]
[3]彭 怀,宋井宽,唐向红.基于信息匹配方法的中文知识库问答系统[J].计算机技术与发展,2022,32(02):14.[doi:10. 3969 / j. issn. 1673-629X. 2022. 02. 002]
 PENG Huai,SONG Jing-kuan,TANG Xiang-hong.Question Answering System of Chinese Knowledge Base Based on Information Matching Method[J].,2022,32(09):14.[doi:10. 3969 / j. issn. 1673-629X. 2022. 02. 002]
[4]毛宏亮,艾孜尔古丽,陈德刚.基于多头注意力的电网调度领域命名实体识别[J].计算机技术与发展,2023,33(02):181.[doi:10. 3969 / j. issn. 1673-629X. 2023. 02. 027]
 MAO Hong-liang,Azragul,CHEN De-gang.Named Entity Recognition in Grid Dispatch Domain Based on Multi-headed Attention[J].,2023,33(09):181.[doi:10. 3969 / j. issn. 1673-629X. 2023. 02. 027]
[5]张 鑫,冼广铭*,梅灏洋,等.基于 Span 方法和多叉解码树的实体关系抽取[J].计算机技术与发展,2023,33(05):152.[doi:10. 3969 / j. issn. 1673-629X. 2023. 05. 023]
 ZHANG Xin,XIAN Guang-ming*,MEI Hao-yang,et al.Entity Relation Extraction Based on Span Method and Multi-fork Decoding Tree[J].,2023,33(09):152.[doi:10. 3969 / j. issn. 1673-629X. 2023. 05. 023]
[6]卜意磊,庞文迪,吴甜甜,等.面向食品监管领域的知识图谱构建研究[J].计算机技术与发展,2023,33(06):202.[doi:10. 3969 / j. issn. 1673-629X. 2023. 06. 030]
 BU Yi-lei,PANG Wen-di,WU Tian-tian,et al.Research on Knowledge Graph Construction for Food Supervision[J].,2023,33(09):202.[doi:10. 3969 / j. issn. 1673-629X. 2023. 06. 030]

更新日期/Last Update: 2023-09-10