[1]何 涛,陈 剑,闻英友,等.基于堆叠模型的司法短文本多标签分类[J].计算机技术与发展,2021,31(03):27-32.[doi:10. 3969 / j. issn. 1673-629X. 2021. 03. 005]
 HE Tao,CHEN Jian,WEN Ying-you,et al.Multi-label Classification of Judicial Short Texts Based on Stacking Model[J].,2021,31(03):27-32.[doi:10. 3969 / j. issn. 1673-629X. 2021. 03. 005]
点击复制

基于堆叠模型的司法短文本多标签分类()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
31
期数:
2021年03期
页码:
27-32
栏目:
大数据分析与挖掘
出版日期:
2021-03-10

文章信息/Info

Title:
Multi-label Classification of Judicial Short Texts Based on Stacking Model
文章编号:
1673-629X(2021)03-0027-06
作者:
何 涛1陈 剑1闻英友1孔为民2
1. 东北大学 东软研究院,辽宁 沈阳 110169;
2. 定陶区人民检察院,山东 菏泽 274100
Author(s):
HE Tao1CHEN Jian1WEN Ying-you1KONG Wei-min2
1. Neusoft Research,Northeastern University,Shenyang 110169,China;
2. People’s Procuratorate of Dingtao,Heze 274100,China
关键词:
堆叠模型BERT卷积神经网络门限循环单元多标签分类
Keywords:
stacking modelbidirectional encoder representations from transformersconvolutional neural networkgated recurrent unit multi-label classification
分类号:
TP391
DOI:
10. 3969 / j. issn. 1673-629X. 2021. 03. 005
摘要:
司法文书短文本的语义多样性和特征稀疏性等特点,对短文本多标签分类精度提出了很大的挑战,传统单一模型的分类算法已无法满足业务需求。 为此, 提出一种融合深度学习与堆叠模型的多标签分类方法。 该方法将分类器划分成两个层次,第一层使用 BERT、卷积神经网络、门限循环单元等深度学习方法作为基础分类器,每个基础分类器模型通过 K折交叉验证得到所有数据的多标签分类概率值,将此概率值数据进行融合形成元数据;第二层使用自定义的深度神经网络作为混合器,以第一层的元数据为输入,通过训练多标签概率矩阵获取模型参数。 该方法将强分类器关联在一起,获得比单个分类器更加强大的性能。 实验结果表明,深度学习堆叠模型实现了 87% 左右的短文本分类 F1 分数,优于 BERT、卷积神经网络、循环神经网络及其他单个模型的性能。
Abstract:
The semantic diversity and feature sparsity of short texts in judicial documents is a great challenge to the accuracy of multilabel classification,so the traditional single model classification algorithm can no longer meet the business needs. For this reason,we propose a multi-label classification method combining deep learning and stacking model. This method divides the classifiers into two layers. In the first layer,deep learning meth-ods such as BERT,convolutional neural network and gated recurrent unit are used as the basic classifier. Each basic classifier model obtains the multi-label classification probability value of all data through K-fold cross-validation,which are merged to form metadata. In the second layer,the user-defined deep neural network is used as the mixer,and the metadata in the first layer is used as the input,and the model parameters are obtai-ned by training the multi label probability matrix. This method associates the strong learners together and gains more powerful functions than a single classifier. The experiment shows that the proposed model stacking method achieves about 87% of the F1 score of short text classification,which is superior to BERT, convolutional? neural network, cyclic neural network and other single models.

相似文献/References:

[1]王珊珊,邹 佳,程 序,等.GSGD:一种基于 BERT 与本体推理的自动分级系统[J].计算机技术与发展,2020,30(08):97.[doi:10. 3969 / j. issn. 1673-629X. 2020. 08. 016]
 WANG Shan-shan,ZOU Jia,CHENG Xu,et al.An Automatic Grading System Based on BERT and Ontology Reasoning[J].,2020,30(03):97.[doi:10. 3969 / j. issn. 1673-629X. 2020. 08. 016]
[2]黄东晋,耿晓云,李 娜,等.基于混合特征的电影评分预测系统[J].计算机技术与发展,2020,30(12):136.[doi:10. 3969 / j. issn. 1673-629X. 2020. 12. 024]
 HUANG Dong-jin,GENG Xiao-yun,LI Na,et al.Film Rating Prediction System Based on Mixed Features[J].,2020,30(03):136.[doi:10. 3969 / j. issn. 1673-629X. 2020. 12. 024]
[3]蔡玉舒,曹 扬,江 维,等.基于 BERT 的端到端旅游评论意见挖掘方法[J].计算机技术与发展,2021,31(09):118.[doi:10. 3969 / j. issn. 1673-629X. 2021. 09. 020]
 CAI Yu-shu,CAO Yang,JIANG Wei,et al.End to End Opinion Mining Method Based on BERT for Tourism Comments[J].,2021,31(03):118.[doi:10. 3969 / j. issn. 1673-629X. 2021. 09. 020]
[4]龚汝鑫,余肖生.基于 BERT-BILSTM 的医疗文本关系提取方法[J].计算机技术与发展,2022,32(04):186.[doi:10. 3969 / j. issn. 1673-629X. 2022. 04. 032]
 GONG Ru-xin,YU Xiao-sheng.Relation Extraction Method of Medical Texts Based on BERT-BILSTM[J].,2022,32(03):186.[doi:10. 3969 / j. issn. 1673-629X. 2022. 04. 032]
[5]苏魁麟,张 凯,吕学强,等.基于融合模型的名词隐喻识别[J].计算机技术与发展,2022,32(06):192.[doi:10. 3969 / j. issn. 1673-629X. 2022. 06. 032]
 SU Kui-lin,ZHANG Kai,LYU Xue-qiang,et al.Noun Metaphor Recognition Based on Fusion Model[J].,2022,32(03):192.[doi:10. 3969 / j. issn. 1673-629X. 2022. 06. 032]
[6]刘华玲,孙 毅.基于实体识别和信息融合的知识图谱研究[J].计算机技术与发展,2022,32(09):107.[doi:10. 3969 / j. issn. 1673-629X. 2022. 09. 017]
 LIU Hua-ling,SUN Yi.Knowledge Graph Based on Entity Recognition and Information Fusion--A Case Study of COVID-19[J].,2022,32(03):107.[doi:10. 3969 / j. issn. 1673-629X. 2022. 09. 017]
[7]胡慧婷,李建平,董振荣,等.基于 BERT 模型的教育技术学领域实体抽取[J].计算机技术与发展,2022,32(10):164.[doi:10. 3969 / j. issn. 1673-629X. 2022. 10. 027]
 HU Hui-ting,LI Jian-ping,DONG Zhen-rong,et al.Named Entity Recognition Method in Educational Technology Field Based on BERT[J].,2022,32(03):164.[doi:10. 3969 / j. issn. 1673-629X. 2022. 10. 027]
[8]陶慧丹,段 亮,王笳辉,等.基于 BERT 的民间文学文本预训练模型[J].计算机技术与发展,2022,32(11):164.[doi:10. 3969 / j. issn. 1673-629X. 2022. 11. 024]
 TAO Hui-dan,DUAN Liang,WANG Jia-hui,et al.BERT Based Pre-training Model of Folk Literature Texts[J].,2022,32(03):164.[doi:10. 3969 / j. issn. 1673-629X. 2022. 11. 024]
[9]许鸿奎,周俊杰,姜彤彤,等.基于 BERT 和混合神经网络的诈骗电话文本识别[J].计算机技术与发展,2022,32(11):37.[doi:10. 3969 / j. issn. 1673-629X. 2022. 11. 006]
 XU Hong-kui,ZHOU Jun-jie,JIANG Tong-tong,et al.Chinese Telephone Fraud Text Recognition Based on Word Embedding and Hybrid Neural Network[J].,2022,32(03):37.[doi:10. 3969 / j. issn. 1673-629X. 2022. 11. 006]
[10]赵建强,朱万彤,陈 诚.基于多重卷积神经网络模型的命名实体识别[J].计算机技术与发展,2023,33(01):187.[doi:10. 3969 / j. issn. 1673-629X. 2023. 01. 028]
 ZHAO Jian-qiang,ZHU Wan-tong,CHEN Cheng.Named Entity Recognition Based on Duplex Convolution Neural Network Model[J].,2023,33(03):187.[doi:10. 3969 / j. issn. 1673-629X. 2023. 01. 028]

更新日期/Last Update: 2020-03-10