[1]陈广智,曾 霖,刘伴晨,等.基于 Python 的电商网站服装数据的爬取与分析[J].计算机技术与发展,2022,32(07):46-51.[doi:10. 3969 / j. issn. 1673-629X. 2022. 07. 008]
 CHEN Guang-zhi,ZENG Lin,LIU Ban-chen,et al.Crawling and Analysis of Clothing Data on E-commerce Websites Based on Python[J].,2022,32(07):46-51.[doi:10. 3969 / j. issn. 1673-629X. 2022. 07. 008]
点击复制

基于 Python 的电商网站服装数据的爬取与分析()

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
32
期数:
2022年07期
页码:
46-51
栏目:
大数据分析与挖掘
出版日期:
2022-07-10

文章信息/Info

Title:
Crawling and Analysis of Clothing Data on E-commerce Websites Based on Python
文章编号:
1673-629X(2022)07-0046-06
作者:
陈广智1曾 霖2刘伴晨1曾天佑1魏欣欣1
1. 郑州航空工业管理学院 智能工程学院,河南 郑州 450046;
2. 贺州学院 人工智能学院,广西 贺州 542899
Author(s):
CHEN Guang-zhi1 ZENG Lin2 LIU Ban-chen1 ZENG Tian-you1 WEI Xin-xin1
1. School of Intelligent Engineering,Zhengzhou University of Aeronautics,Zhengzhou 450046,China;
2. School of Artificial Intelligence,Hezhou University,Hezhou 542899,China
关键词:
电商网站服装数据网络爬取数据分析t-SNE 聚类
Keywords:
e-commerce websitesclothing dataweb crawlingdata analysist-SNE clustering
分类号:
TP311
DOI:
10. 3969 / j. issn. 1673-629X. 2022. 07. 008
摘要:
电商网站上蕴藏着大量有价值的信息,同时中国的纺织服装产业消费市场非常大,因此,对电商网站上服装数据的爬取、分析非常有意义。 为及时准确地获取当前服装产品的流行趋势、消费热点,以便于商家精准投放产品、消费者更理性消费,提出了针对电商网站服装数据的爬取算法 fashionDataScrape。 该算法将服装商品文字描述信息与图片信息的爬取相分离,具有一定的灵活性,同时能基于关键词爬取服装信息。 给出了算法的详细设计类图。 采用 Python 语言实现了该算法,其中主要使用了 Requests 和 Beautiful Soup 库,并用 lxml 作为 HTML 解析器。 以“ 连衣裙女装新品” 、“女装 t 恤” 和“旗袍年轻版”为关键词分别爬取了相应的服装信息,对爬取结果和实际页面进行了人工对比,验证了算法的可行性和有效性。 通过对爬取结果的商品描述分析、价格分析和图片的 t-SNE 聚类可视化分析,进一步验证了电商网站服装数据爬取的意义。
Abstract:
E-commerce websites contain a lot of valuable information,and China’s textile and apparel industry has a large amount? ?of consumer market. Therefore,it is meaningful to crawl and analyze clothing data on e-commerce websites. In order to obtain the current fashion trends and consumption hot spots of clothing products in a timely and accurate manner,so that merchants can place products accurately and consumers can consume more rationally,a crawling algorithm,fashion Data Scrape,for clothing data of e-commerce websites is proposed. The algorithm separates the crawling of the text description information of clothing products from that of image information,which has certain flexibility and can crawl clothing informa-tion based on keywords. The detailed design class diagram of the algorithm is given. The algorithm is implemented by Python language,which mainly uses the Requests and Beautiful Soup libraries,and uses lxml as the HTML parser. Corresponding clothing information was crawled with “new dresses for women” ,“ women’s t-shirts” and “cheongsam young version” as keywords,and the crawling results were manually compared with the actual pages,verifying the feasibility and effectiveness of the algorithm. Through the product description analysis,price analysis and t - SNE clustering visual analysis,the significance of crawling clothing data on E-commerce websites is further verified.
更新日期/Last Update: 2022-07-10