[1]高洁云,赵逢禹,刘 亚.基于语义增强的改进混合特征选择的文本分类[J].计算机技术与发展,2021,31(01):24-29.[doi:10. 3969 / j. issn. 1673-629X. 2021. 01. 005]
 GAO Jie-yun,ZHAO Feng-yu,LIU Ya.Text Classification of Modified Hybrid Feature Selection Based on Semantic Enhancement[J].,2021,31(01):24-29.[doi:10. 3969 / j. issn. 1673-629X. 2021. 01. 005]
点击复制

基于语义增强的改进混合特征选择的文本分类()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
31
期数:
2021年01期
页码:
24-29
栏目:
大数据分析与挖掘
出版日期:
2021-01-10

文章信息/Info

Title:
Text Classification of Modified Hybrid Feature Selection Based on Semantic Enhancement
文章编号:
1673-629X(2021)01-0024-06
作者:
高洁云赵逢禹刘 亚
上海理工大学 光电信息与计算机工程学院,上海 200093
Author(s):
GAO Jie-yunZHAO Feng-yuLIU Ya
School of Optoelectronic Information and Computer Engineering,University of Shanghai for Science and Technology,Shanghai 200093,China
关键词:
混合特征选择语义分析词向量文本分类LSTM
Keywords:
hybrid feature selectionsemantic analysisword-embeddingtext classificationLSTM
分类号:
TP391. 41
DOI:
10. 3969 / j. issn. 1673-629X. 2021. 01. 005
摘要:
如何从文本中抽取出能够体现文本特点的关键特征,抓取特征到类别之间的映射是文本分类核心问题之一。 传统的词袋模型的优点是将每个词视为一个特征,而缺点是计算成本会随特征数量和文本与特征之间的关系的增加而增加,并且没有考虑文本特征自身的语义关系,语义关系的优势是获取文本和特征之间的相关性。 针对这个问题,提出一种增强混合特征选择方法,该方法使用混合特征选择进行降维,然后再使用词向量对低频词进行语义增强。 为了验证增强的混合特征选择对文本分类的作用,构建了两个实验,使用 LSTM 算法进行分类模型训练与测试。 对爬取的 71 825 个新闻文本数据进行实验表明,基于语义的增强混合特征选择方法在文本分类时既提高了分类效率又能保证分类精度。
Abstract:
One of the core problems of text classification is how to extract the key features that can reflect the characteristics of the text from the text and capture the mapping between features and categories. The advantage of the traditional bag-of-words model is to treat each word as a feature, while the disadvantage is that the calculation cost increases with the increase in the number of features and the relationship between text and features,and the semantic relationship of the text features themselves is not considered. The advantage of semantic relationships is to get the correlation between text and features. Aiming at this problem,we propose an enhanced hybrid feature selection method which uses hybrid feature selection to reduce the dimension,and then uses word embedding to semantically enhance lowfrequency words. In order to verify the effect of enhanced hybrid feature selection on text classification,two experiments are constructed,using the LSTM algorithm to train and test the clas-sification model. Experiments on 71 825 news text data crawled show that the semanticbased enhanced hybrid feature selection method not only improves the classification efficiency but also ensures the classification accuracy in text classification.

相似文献/References:

[1]汪畅 王铮 张胜歧.基于动词属性的模板化自动代码生成[J].计算机技术与发展,2010,(05):104.
 WANG Chang,WANG Zheng,ZHANG Sheng-qi.Template Automatic Code Generation Based on Properties of the Verb[J].,2010,(01):104.
[2]袁浩 黄烟波.网页标题分析对主题爬虫的改进[J].计算机技术与发展,2009,(06):22.
 YUAN Hao,HUANG Yan-bo.Analysis of Title Page to Improve Focus Crawler[J].,2009,(01):22.
[3]陈国华 赵克 李亚涛 易帅.自然语言处理系统中的事件类名词的耦合处理[J].计算机技术与发展,2008,(06):60.
 CHEN Guo-hua,ZHAO Ke,LI Ya-tao,et al.Coupling Processing of Event Noun in NLP Systems[J].,2008,(01):60.
[4]丰博 胡钢伟 赵克 亿珍珍.一种自反馈汉语切词系统的研究和实现[J].计算机技术与发展,2006,(05):7.
 FENG Bo,HU Gang-wei,ZHAO Ke,et al.Research and Realization on Self-Feeding Back Chinese Words Segmentation System[J].,2006,(01):7.
[5]王文霞. 基于分级策略和聚类索引树的构件检索方法[J].计算机技术与发展,2016,26(04):110.
 WANG Wen-xia. A Component Retrieval Method Based on Classified Policy and Cluster Index Tree[J].,2016,26(01):110.
[6]张景,牛耘. 中文微博评价对象识别研究[J].计算机技术与发展,2017,27(01):6.
 ZHANG,Jing NIU Yun. Research on Opinion Target Extraction in Chinese Microblogs[J].,2017,27(01):6.
[7]刘高军,印佳明.基于图书特征及词典的豆瓣图书垃圾评论识别[J].计算机技术与发展,2019,29(11):107.[doi:10. 3969 / j. issn. 1673-629X. 2019. 11. 022]
 LIU Gao-jun,YIN Jia-ming.Identification of Douban Book Spam Comments Based on Book Features and Dictionary[J].,2019,29(01):107.[doi:10. 3969 / j. issn. 1673-629X. 2019. 11. 022]
[8]陈 莹,叶 宁,徐 康,等.基于领域特征指示词的隐式特征识别研究[J].计算机技术与发展,2021,31(09):24.[doi:10. 3969 / j. issn. 1673-629X. 2021. 09. 005]
 CHEN Ying,YE Ning,XU Kang,et al.Research on Implicit Feature Identification Based on Domain Feature Indicators[J].,2021,31(01):24.[doi:10. 3969 / j. issn. 1673-629X. 2021. 09. 005]

更新日期/Last Update: 2020-01-10