[1]黄东晋,耿晓云,李 娜,等.基于混合特征的电影评分预测系统[J].计算机技术与发展,2020,30(12):136-141.[doi:10. 3969 / j. issn. 1673-629X. 2020. 12. 024]
 HUANG Dong-jin,GENG Xiao-yun,LI Na,et al.Film Rating Prediction System Based on Mixed Features[J].,2020,30(12):136-141.[doi:10. 3969 / j. issn. 1673-629X. 2020. 12. 024]
点击复制

基于混合特征的电影评分预测系统()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
30
期数:
2020年12期
页码:
136-141
栏目:
应用开发研究
出版日期:
2020-12-10

文章信息/Info

Title:
Film Rating Prediction System Based on Mixed Features
文章编号:
1673-629X(2020)12-0136-06
作者:
黄东晋耿晓云李 娜丁友东
上海大学,上海 200072
Author(s):
HUANG Dong-jinGENG Xiao-yunLI NaDING You-dong
Shanghai University,Shanghai 200072,China
关键词:
电影评分预测机器学习自然语言处理文本矢量特征Bert
Keywords:
film rating predictionmachine learningnatural language processingtext vector featuresBert
分类号:
TP181
DOI:
10. 3969 / j. issn. 1673-629X. 2020. 12. 024
摘要:
电影评分是衡量一部电影优劣的重要标准,对于投资商和观影者极具参考价值,因此电影评分的预测成为电影领域的研究热点。 然而目前的评分预测系统由于特征信息不足,特征工程处理方法过于简单,机器学习算法较为单一,所以预测误差偏大。 针对这一问题,结合自然语言处理技术提出一种基于混合特征的预测模型,并应用到电影评分预测系统中。 数据集来源是某常用电影网站,同时为了获取更好的训练数据,需要对电影特征信息进行复杂的特征工程处理。利用训练完成的 Bert 模型矢量化电影数据集中的文本信息得到文本矢量特征,并采用支持向量机(SVM)算法初步训练预测评分。将该评分作为一维新特征和电影特征信息一起通过随机森林(random forest)算法训练预测最终评分。 实验结果表明,该预测模型是可行的,预测值与真实值的误差较小,准确性显著提升。
Abstract:
Film rating is an important criterion for measuring the pros and cons of a film,which is of great reference value for investors and moviegoers. Therefore,the prediction of film rating has become a research hotspot in the film field. However,the current film rating prediction system has insufficient feature information,the feature engineering processing method is too simple,and the machine learning algorithm is relatively simple, so the prediction error is too large. Aiming at this problem,a prediction model based on mixed features is proposed in combination with natural language processing technology and applied to the film rating prediction system. The source of the dataset is a commonly used film website. At the same time,in order to obtain better training data,complex feature engineering processing of film feature information is required. The trained Bert is used to vectorize the text information in the film dataset to obtain the text vector features,and the support vector machine (SVM) algorithm is used to initially train and predict the text rating. The rating is used as a one-dimensional new feature along with film feature information to train and predict the final rating through the random forest algorithm. The experiment shows that the prediction model is feasible,the error between the predicted value and the real value is small,and the accuracy is significantly improved.

相似文献/References:

[1]陈全 赵文辉 李洁 江雨燕.选择性集成学习算法的研究[J].计算机技术与发展,2010,(02):87.
 CHEN Quan,ZHAO Wen-hui,LI Jie,et al.Research of Selective Ensemble Learning Algorithm[J].,2010,(12):87.
[2]黄秀丽 王蔚.SVM在非平衡数据集中的应用[J].计算机技术与发展,2009,(06):190.
 HUANG Xiu-li,WANG Wei.Application of SVM in Imbalances Dataset[J].,2009,(12):190.
[3]鲁晓南 接标.一种基于个性化邮件特征的反垃圾邮件系统[J].计算机技术与发展,2009,(08):155.
 LU Xiao-nan,JIE Biao.An Individual Anti- Spam Technology[J].,2009,(12):155.
[4]张苗 张德贤.多类支持向量机文本分类方法[J].计算机技术与发展,2008,(03):139.
 ZHANG Miao,ZHANG De-xian.Research on Text Categorization Based on. M- SVMs[J].,2008,(12):139.
[5]汤萍萍 王红兵.基于强化学习的Web服务组合[J].计算机技术与发展,2008,(03):142.
 TANG Ping-ping,WANG Hong-bing.Web Service Composition Based on Reinforcement -Learning[J].,2008,(12):142.
[6]杨雪洁 赵姝 张燕平.基于商空间理论的冬小麦产量预测和分析[J].计算机技术与发展,2008,(03):249.
 YANG Xue-jie,ZHAO Shu,ZHANG Yan-ping.Analysis on Winter Wheat Yield Based on Quotient Space Theory[J].,2008,(12):249.
[7]汤伟 程家兴 纪霞.一种基于概率推理的邮件过滤系统的研究与设计[J].计算机技术与发展,2008,(08):76.
 TANG Wei,CHENG Jia-xing,JI Xia.Research and Design of a Spam Filtering System Based on Probability Inference[J].,2008,(12):76.
[8]孙海虹 丁华福.基于模糊粗糙集的Web文本分类[J].计算机技术与发展,2010,(07):21.
 SUN Hai-hong,DING Hua-fu.Web Document Classification Based on Fuzzy-Rough Set[J].,2010,(12):21.
[9]汤伟 程家兴 纪霞.统计学理论在邮件分类中的应用研究[J].计算机技术与发展,2008,(12):231.
 TANG Wei,CHENG Jia-xing,JI Xia.Research and Design of a Spam Filtering System Based on Statistical Learning Theory[J].,2008,(12):231.
[10]张高胤 谭成翔 汪海航.基于K-近邻算法的网页自动分类系统的研究及实现[J].计算机技术与发展,2007,(01):21.
 ZHANG Gao-yin,TAN Cheng-xiang,WANG Hai-hang.Design and Implementation of Web Page Automation Classification System Based on K- Nearest Neighbor Algorithm[J].,2007,(12):21.

更新日期/Last Update: 2020-12-10