[1]邹红旭,潘冠华,李 吟.基于 Spark 框架的改进协同过滤算法[J].计算机技术与发展,2020,30(05):38-42.[doi:10. 3969 / j. issn. 1673-629X. 2020. 05. 008]
 ZOU Hong-xu,PAN Guan-hua,LI Yin.Improved Collaborative Filtering Algorithm Based on Spark[J].COMPUTER TECHNOLOGY AND DEVELOPMENT,2020,30(05):38-42.[doi:10. 3969 / j. issn. 1673-629X. 2020. 05. 008]
点击复制

基于 Spark 框架的改进协同过滤算法()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
30
期数:
2020年05期
页码:
38-42
栏目:
智能、算法、系统工程
出版日期:
2020-05-10

文章信息/Info

Title:
Improved Collaborative Filtering Algorithm Based on Spark
文章编号:
1673-629X(2020)05-0038-05
作者:
邹红旭潘冠华李 吟
江苏自动化研究所,江苏 连云港 222006
Author(s):
ZOU Hong-xuPAN Guan-huaLI Yin
Jiangsu Automation Research Institute of CSIC,Lianyungang 222006,China
关键词:
协同过滤Spark稀疏数相似度计等值连接
Keywords:
collaborative filteringSparksparse datasimilarity calculationequivalent connection
分类号:
TP391
DOI:
10. 3969 / j. issn. 1673-629X. 2020. 05. 008
摘要:
随着互联网数据量的不断膨胀,单机已经无法在可接受的时间范围内计算完基于大规模数据的推荐算法,也无法存放海量的数据。 利用 Spark 平台内存计算的优点,设计了一种分布式的基于项目的协同过滤算法,利用 Spark 提供的RDD(resilient distributed dataset) 算子完成算法的设计。 针对由于数据稀疏而导致的相似度计算不准确的问题,提出了一种利用两项目间公共用户数目进行加权的相似度计算公式,提高了最终推荐结果的准确度。 为了改善计算中涉及到的数据表等值连接操作耗时太长的问题,利用自定义的 Hash_join 函数替代 Spark 自带的连接操作算子,提高了计算效率。 采用 UCI 的公用数据集 MovieLens 对算法进行测试,并分别与改进前的算法以及单机运行的算法进行对比,结果表明,改进的算法在准确度和效率方面都有更好的表现。
Abstract:
With the explosive growth of data,single-computer computing has been unable to meet the real-time requirements of recommendation algorithms,nor can it store massive data. A distributed item-based collaborative filtering algorithm is designed based on the advantages of memory computing in Spark platform,and the RDD (resilient distributed dataset) provided by Spark is used to complete the design of the algorithm. To solve the problem of inaccurate similarity caused by sparse data,a similarity calculation formula weighted by the number of common users between two items is proposed, which improves the accuracy of the final recommendation results. Equivalent connection of data tables is involved in the calculation. In order to reduce the time consumed by equivalent connection of data tables,the user-defined Hash_join function is used to improve the calculated performance. The performance of the algorithm based on Spark platform is tested by MovieLens dataset. Compared with the original algorithm and the one running on a single computer respectively,it is showed that the improved algorithm has better performance in accuracy and efficiency.

相似文献/References:

[1]邵延振 蒙韧 袁鼎荣 李新友.基于Web结构分区的协同过滤推荐算法研究[J].计算机技术与发展,2010,(06):67.
 SHAO Yan-zhen,MENG Ren,YUAN Ding-rong,et al.Collaborative Filtering Recommendation Algorithm Research Based on Web Blocks[J].COMPUTER TECHNOLOGY AND DEVELOPMENT,2010,(05):67.
[2]查文琴 梁昌勇 曹镭.基于用户聚类的协同过滤推荐方法[J].计算机技术与发展,2009,(06):69.
 ZHA Wen-qin,LIANG Chang-yong,CAO Lei.Collaborative Filtering Recommendation Method Based on Clustering of Users[J].COMPUTER TECHNOLOGY AND DEVELOPMENT,2009,(05):69.
[3]姜雅倩 王直杰 张珏.基于供求关系及协同过滤技术的推荐模型研究[J].计算机技术与发展,2007,(06):18.
 JIANG Ya-qian,WANG Zhi-jie,ZHANG Jue.Research on Recommendation Model Based on Supply and Demand Relation and Collaborative Filtering[J].COMPUTER TECHNOLOGY AND DEVELOPMENT,2007,(05):18.
[4]游文 叶水生.电子商务推荐系统中的协同过滤推荐[J].计算机技术与发展,2006,(09):70.
 YOU Wen,YE Shui-sheng.A Survey of Collaborative Filtering Algorithm Applied in E- commerce Recommender System[J].COMPUTER TECHNOLOGY AND DEVELOPMENT,2006,(05):70.
[5]徐红 彭黎 郭艾寅 徐云剑.基于用户多兴趣的协同过滤策略改进研究[J].计算机技术与发展,2011,(04):73.
 XU Hong,PENG Li,GUO Ai-yin,et al.User-Based Collaborative Filtering Strategies More Interested in Improvement of Research[J].COMPUTER TECHNOLOGY AND DEVELOPMENT,2011,(05):73.
[6]杨东风 牛永洁.基于混合规则的图书推荐模型设计与研究[J].计算机技术与发展,2011,(07):210.
 YANG Dong-feng,NIU Yong-jie.Books Recommended Model Design and Research Based on Mixing Rules[J].COMPUTER TECHNOLOGY AND DEVELOPMENT,2011,(05):210.
[7]吴月萍 王娜 马良.基于蚁群算法的协同过滤推荐系统的研究[J].计算机技术与发展,2011,(10):73.
 WU Yue-ping,WANG Na,MA Liang.Research of Collaboration Filtering Recommendation System Based on Ant Algorithm[J].COMPUTER TECHNOLOGY AND DEVELOPMENT,2011,(05):73.
[8]李克潮,蓝冬梅.一种属性和评分的协同过滤混合推荐算法[J].计算机技术与发展,2013,(07):116.
 LI Ke-chao,LAN Dong-mei.A Collaborative Filtering Hybrid Recommendation Algorithm for Attribute and Rating[J].COMPUTER TECHNOLOGY AND DEVELOPMENT,2013,(05):116.
[9]范虎,花伟伟.协同过滤推荐算法的研究与改进[J].计算机技术与发展,2013,(09):66.
 FAN Hu[],HUA Wei-wei[].Research and Improvement of Collaborative Filtering Recommendation Algorithm[J].COMPUTER TECHNOLOGY AND DEVELOPMENT,2013,(05):66.
[10]李振博,徐桂琼,査九. 基于用户谱聚类的协同过滤推荐算法[J].计算机技术与发展,2014,24(09):59.
 LI Zhen-bo,XU Gui-qiong,ZHA Jiu. A Collaborative Filtering Recommendation Algorithm Based on User Spectral Clustering[J].COMPUTER TECHNOLOGY AND DEVELOPMENT,2014,24(05):59.

更新日期/Last Update: 2020-05-10