[1]王珊珊,邹 佳,程 序,等.GSGD:一种基于 BERT 与本体推理的自动分级系统[J].计算机技术与发展,2020,30(08):97-102.[doi:10. 3969 / j. issn. 1673-629X. 2020. 08. 016]
 WANG Shan-shan,ZOU Jia,CHENG Xu,et al.An Automatic Grading System Based on BERT and Ontology Reasoning[J].,2020,30(08):97-102.[doi:10. 3969 / j. issn. 1673-629X. 2020. 08. 016]
点击复制

GSGD:一种基于 BERT 与本体推理的自动分级系统()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
30
期数:
2020年08期
页码:
97-102
栏目:
智能、算法、系统工程
出版日期:
2020-08-10

文章信息/Info

Title:
An Automatic Grading System Based on BERT and Ontology Reasoning
文章编号:
1673-629X(2020)08-0097-06
作者:
王珊珊12邹 佳12程 序12刘汪洋12蔡惠民12
1. 中电科大数据研究院有限公司,贵州 贵阳 550022; 2. 提升政府治理能力大数据应用技术国家工程实验室,贵州 贵阳 550022
Author(s):
WANG Shan-shan12ZOU Jia12CHENG Xu12LIU Wang-yang12CAI Hui-min12
1. CETC Big Data Research Institute Co. ,Ltd. ,Guiyang 550022,China; 2. National Engineering Laboratory for Big Data Application in Improving Government Governance Capabilities,Guiyang 550022,China
关键词:
数据分级政府数据BERT法律本体余弦相似度
Keywords:
data gradinggovernment dataBERTlegal ontologycosine similarity
分类号:
TP39
DOI:
10. 3969 / j. issn. 1673-629X. 2020. 08. 016
摘要:
政府数据资源分级管理是政府数据共享开放和数据治理的关键性工作。由于数据资源规模大,分级体系不完善,工具缺乏,使得该工作多由人工进行,导致支撑依据不足、主观性强、精确性差、成效不足。 文中设计并实现了基于政策法规、典型案例的政府数据自动分级系统—GSGD(grading system for government data)。 首先, 利用政策法规以及典型案例构建本体库, 根据分级目标以及构建的本体特性,构建自定义推理规则;再通过 BERT 获得输入数据与关键词的语义特征词/句向量,并计算向量之间的余弦相似度; 最后对相似度较高的关键词,采用 Jena 对政策法规库以及典型案例库进行查询推理得到分级结果以及分级依据,以实现对政府数据的自动化分级,提高分级工作效率。 通过实验对比分析,验证了该方法的有效性。
Abstract:
Grading of government data resources is the key work of government data sharing and opening. Due to the large scale of data resources,imperf- ect classification system and lack of tools,this work is mostly carried out manually,which leads to insufficient supporting basis,strong subjectivity, poor accuracy and insufficient effectiveness. Therefore,we design and implement GSGD,an automatic grading system for government data based on policies,regulations and typical cases. Firstly,policies and regulations as well as typical cases are used to build ontology, and custom inference rules are built according to grading work and the ontology characteristics. Then, the semantic features word/sentence vectors of the input data and keywords are obtained through BERT,and cosine similarity between the vectors is calculated. Finally,for keywords with high similarity,Jena is used to query and reason the policy and regulation database and typical case database to obtain grading results and basis, which helps automatically to grade the data. The effectiveness of the method is verified by experiment.
更新日期/Last Update: 2020-08-10