[1]向 倩.基于聚类和相似度计算的陆空通话词向量评估[J].计算机技术与发展,2020,30(09):137-142.[doi:10. 3969 / j. issn. 1673-629X. 2020. 09. 025]
 XIANG Qian.Word Embeddings Evaluation Based on Clustering and Similarity Computing in Radiotelephony Communications[J].,2020,30(09):137-142.[doi:10. 3969 / j. issn. 1673-629X. 2020. 09. 025]
点击复制

基于聚类和相似度计算的陆空通话词向量评估()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
30
期数:
2020年09期
页码:
137-142
栏目:
应用开发研究
出版日期:
2020-09-10

文章信息/Info

Title:
Word Embeddings Evaluation Based on Clustering and Similarity Computing in Radiotelephony Communications
文章编号:
1673-629X(2020)09-0137-06
作者:
向 倩
中国民航大学 空中交通管理学院,天津 300300
Author(s):
XIANG Qian
School of Air Traffic Management,Civil Aviation University of China,Tianjin 300300,China
关键词:
陆空通话词向量概念分类句子相似度孪生网络
Keywords:
radiotelephony communicationsword embeddingsconcept categorizationsentence similaritySiamese network
分类号:
TP39
DOI:
10. 3969 / j. issn. 1673-629X. 2020. 09. 025
摘要:
无线电陆空通话是管制员与飞行员进行话音通信的方式,对航空器运行有着重要作用。 在陆空通话用语的处理中,词向量是充分表征词汇语义的有效表现形式。 为保证管制员飞行员人机对话系统词向量输入质量,提出了基于 KMeans 概念分类和基于孪生网络句子相似度计算的词向量评估方法。 概念分类实验分析了单词依托向量映射到手工分类词典的准确率, 结果显示准确率平均值达 80.2% ,浅层证明词向量具备表征语义区分单词的能力,符合空管指令分类明显的特征。 句子相似度计算利用基于 Siamese 网络的模型计算了空管指令对的相似度值,该模型与基于 wordnet 层级距离、基于编辑距离方法的相似判断准确率分别为 93.6% 、65.8% 、43.7% ,前者远超其他两种方法,深层证明词向量能充分捕获词汇语义,满足对话系统对词向量质量的输入需求。
Abstract:
Radiotelephony communications is a means of voice communication between controllers and pilots,which plays an important role in the aircraft operations. In the processing of radiotelephony communications,word embeddings is an effective representation to capture lexical semantics. To ensure the quality of word embeddings inputting into the human-machine dialogue system between controllers and pilots,an evaluation method combining concept categorization is proposed based on K-Means and sentence similarity computing based on Siamese network. In the experiment of concept categorization,the accuracy of words mapping to manual dictionary is analyzed,which is up to 80.2%. It is proved that the word embeddings has the ability of represen-ting semantics and distinguishing words,which is consistent with the classification feature of radiotelephony communications. The similarity of two instructions is calculated by Siamese-based network model in sentence similarity computing,and the accuracy of this model is 93.6% which highly exceeds the models based on wordnet hierarchy distance (65.8% ) and edit distance (43.7% ). The result shows that word embeddings can fully capture lexical semantics and meet the input requirements of the dialogue system for the word embeddings quality.

相似文献/References:

[1]彭昀磊,牛耘.基于词向量的特征词选择[J].计算机技术与发展,2018,28(06):7.[doi:10.3969/ j. issn.1673-629X.2018.06.002]
 PENG Yun-lei,NIU Yun.Feature Words Selection Based on Word Embedding[J].,2018,28(09):7.[doi:10.3969/ j. issn.1673-629X.2018.06.002]
[2]张翠肖,郝杰辉,刘星宇,等.基于 CNN-BiLSTM 的中文微博立场分析研究[J].计算机技术与发展,2020,30(07):154.[doi:10. 3969 / j. issn. 1673-629X. 2020. 07. 033]
 ZHANG Cui-xiao,HAO Jie-hui,LIU Xing-yu,et al.Research on Stance Detection in Chinise Micro-blog Based on CNN-BiLSTM[J].,2020,30(09):154.[doi:10. 3969 / j. issn. 1673-629X. 2020. 07. 033]
[3]秦牧轩,荆晓远,吴 飞.基于公共空间嵌入的端到端深度零样本学习[J].计算机技术与发展,2018,28(11):44.[doi:10.3969/ j. issn.1673-629X.2018.11.010]
 QIN Mu-xuan,JING Xiao-yuan,WU Fei.End-to-end Deep Zero-shot Learning Based on Co-space Embedding[J].,2018,28(09):44.[doi:10.3969/ j. issn.1673-629X.2018.11.010]
[4]毛宇薇,牛耘.基于关键词的蛋白质交互关系识别[J].计算机技术与发展,2019,29(03):18.[doi:10.3969/ j. issn.1673-629X.2019.03.004]
 MAO Yu-wei,NIU Yun.Protein-protein Interaction Identification Based on Keywords[J].,2019,29(09):18.[doi:10.3969/ j. issn.1673-629X.2019.03.004]
[5]李东欣,禹 龙,田生伟,等.注意力机制的 LSTM-DBN 维语人称代词指代消解[J].计算机技术与发展,2019,29(07):33.[doi:10. 3969 / j. issn. 1673-629X. 2019. 07. 007]
 LI Dong-xin,YU Long,TIAN Sheng-wei,et al.Attention Mechanism of LSTM-DBN Uyghur Personal Pronoun Anaphora Resolution[J].,2019,29(09):33.[doi:10. 3969 / j. issn. 1673-629X. 2019. 07. 007]
[6]孙本旺,田 芳.基于深度学习算法的藏文微博情感计算研究[J].计算机技术与发展,2019,29(10):55.[doi:10. 3969 / j. issn. 1673-629X. 2019. 10. 012]
 SUN Ben-wang,TIAN Fang.Research on Tibetan Micro-blog Affective Computation Based on Deep Learning Algorithm[J].,2019,29(09):55.[doi:10. 3969 / j. issn. 1673-629X. 2019. 10. 012]
[7]高洁云,赵逢禹,刘 亚.基于语义增强的改进混合特征选择的文本分类[J].计算机技术与发展,2021,31(01):24.[doi:10. 3969 / j. issn. 1673-629X. 2021. 01. 005]
 GAO Jie-yun,ZHAO Feng-yu,LIU Ya.Text Classification of Modified Hybrid Feature Selection Based on Semantic Enhancement[J].,2021,31(09):24.[doi:10. 3969 / j. issn. 1673-629X. 2021. 01. 005]
[8]陈家乐,张艳玲.计算机算法类资料的中英文智能翻译[J].计算机技术与发展,2021,31(07):176.[doi:10. 3969 / j. issn. 1673-629X. 2021. 07. 029]
 CHEN Jia-le,ZHANG Yan-ling.English-Chinese Intelligent Translation of Computer Algorithm Corpus[J].,2021,31(09):176.[doi:10. 3969 / j. issn. 1673-629X. 2021. 07. 029]
[9]陈 莹,叶 宁,徐 康,等.基于领域特征指示词的隐式特征识别研究[J].计算机技术与发展,2021,31(09):24.[doi:10. 3969 / j. issn. 1673-629X. 2021. 09. 005]
 CHEN Ying,YE Ning,XU Kang,et al.Research on Implicit Feature Identification Based on Domain Feature Indicators[J].,2021,31(09):24.[doi:10. 3969 / j. issn. 1673-629X. 2021. 09. 005]
[10]尚福华,金 泉*,曹茂俊.基于 Senna-BiLSTM-CRF 的测井实体抽取方法研究[J].计算机技术与发展,2021,31(12):180.[doi:10. 3969 / j. issn. 1673-629X. 2021. 12. 030]
 SHANG Fu-hua,JIN Quan*,CAO Mao-jun.Research on Logging Named Entity Extraction Method Based onSenna-BiLSTM-CRF[J].,2021,31(09):180.[doi:10. 3969 / j. issn. 1673-629X. 2021. 12. 030]

更新日期/Last Update: 2020-09-10