[1]孙青云,王俊峰,赵宗渠,等.一种基于模拟登录的微博数据采集方案[J].计算机技术与发展,2014,24(03):6-10.
 SUN Qing-yun[],WANG Jun-feng[],ZHAO Zong-qu[],et al.A Microblog Data Collection Method Based on Simulated Login Technology[J].,2014,24(03):6-10.
点击复制

一种基于模拟登录的微博数据采集方案()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
24
期数:
2014年03期
页码:
6-10
栏目:
智能、算法、系统工程
出版日期:
2014-03-31

文章信息/Info

Title:
A Microblog Data Collection Method Based on Simulated Login Technology
文章编号:
1673-629X(2014)03-0006-05
作者:
孙青云1王俊峰2赵宗渠1高梦超1
1.四川大学 计算机学院;2.视觉合成图形图像技术重点实验室
Author(s):
SUN Qing-yun[1]WANG Jun-feng[2] ZHAO Zong-qu[1]GAO Meng-chao[1]
关键词:
微博API模拟登录技术网络爬虫
Keywords:
microblog APIsimulated login technologyWeb crawler
分类号:
TP301
文献标志码:
A
摘要:
随着Web 2.0时代的到来,舆情信息在微博上能够更快速的产生和传播。为了有效地分析微博舆情信息,微博数据的获取显得尤为重要。文中以新浪微博为研究对象,提出了基于模拟登录的网络爬虫采集方案。此方案解决了调用微博API接口对开发者的次数限制,解决了传统的网络爬虫需要身份验证的问题,加快了微博数据的采集速度,可以在短时间内获得海量的微博数据。实验表明,用该方案开发的系统具有快速的微博信息采集速度,更加灵活,可以很好地为舆情系统分析提供大量准确的数据支持。
Abstract:
Public sentiment information on the microblog generates rapidly and disseminates widely resulting from the coming era of Web 2. 0. Now the information collection is becoming more and more important in analyzing public sentiment. A Web crawler based on simu-lated login technology on the Sina microblog is presented. In the crawler,resolve the limiting numbers of calling microblog API interface for developer,meanwhile it provides a solution for the authentication of traditional Web crawler. It can collect huge amount of data in the short-term because of accelerated progress of collection. According to the result of experiments,this system can improve the microblog in-formation collection speed and become more flexible that can provide accurate data for the public sentiment analysis system.
更新日期/Last Update: 1900-01-01