[1]时亚南[],张太红[][],陈燕红[] 郭斌[]. 大规模非结构化数据的索引技术研究[J].计算机技术与发展,2014,24(12):109-113.
 SHI Ya-nan[],ZHANG Tai-hong[][],CHEN Yan-hong[],et al. Study on Large-scale Unstructured Data Indexing Technology[J].,2014,24(12):109-113.
点击复制

 大规模非结构化数据的索引技术研究()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
24
期数:
2014年12期
页码:
109-113
栏目:
智能、算法、系统工程
出版日期:
2014-12-10

文章信息/Info

Title:
 Study on Large-scale Unstructured Data Indexing Technology
文章编号:
1673-629X(2014)12-0109-05
作者:
 时亚南[1]张太红[1][2]陈燕红[1] 郭斌[1]
1. 新疆农业大学 计算机与信息工程学院;2.中国农业大学 信息与电气工程学院
Author(s):
 SHI Ya-nan[1]ZHANG Tai-hong[1][2]CHEN Yan-hong[1]GUO Bin[1]
关键词:
 大规模数据倒排索引分块式存储线性散列B+树
Keywords:
 large-scale datainverted indexblock storagelinear hashB+tree
分类号:
TP31
文献标志码:
A
摘要:
 为解决搜索引擎ASPSeek在大规模数据下检索效率低下、占用空间大以及不利于更新等问题,提出了一种分块式存储的倒排索引组织技术,并对基于外存的B+树索引和线性散列索引的性能进行了比较测试研究。测试结果表明,查询每万条数据耗时线性散列为B+树索引快57.40%,插入每万条数据耗时线性散列为B+树索引的2.44倍,删除每万条数据耗时线性散列为B+树索引的83.52%,线性散列索引文件大小为B+树索引文件大小的109.56%。由测试结果可知,B+树索引具有较快的索引构建和更新速度,而线性散列索引则具有较高的磁盘空间占用率和较好的查询性能。
Abstract:
 To solve the problem that in large-scale data condition the ASPSeek search engine retrievals inefficiently,has large disk space occupancy and can’t be conducive to update,propose an inverted index-organized technique based on block storage,and make a per-formance comparison research test between external memory based B+tree index and linear hash index.Test results show that,for queries per million data-consuming linear hashing to B+tree index is 57.40%,for inserting per million data-consuming linear hash is 2.44 times to B+tree index,for deleting every million data-consuming linear hash to B+tree index is 83.52%,linear hash index file size is 109.56% of B+tree index file size.According to the test results,B+tree index has the faster index building and updating speed,while linear hash index has the higher disk space occupancy rates and better query performance.

相似文献/References:

[1]张志宏,吴庆波,邵立松,等.基于飞腾平台TOE协议栈的设计与实现[J].计算机技术与发展,2014,24(07):1.
 ZHANG Zhi-hong,WU Qing-bo,SHAO Li-song,et al. Design and Implementation of TCP/IP Offload Engine Protocol Stack Based on FT Platform[J].,2014,24(12):1.
[2]梁文快,李毅. 改进的基因表达算法对航班优化排序问题研究[J].计算机技术与发展,2014,24(07):5.
 LIANG Wen-kuai,LI Yi. Research on Optimization of Flight Scheduling Problem Based on Improved Gene Expression Algorithm[J].,2014,24(12):5.
[3]黄静,王枫,谢志新,等. EAST文档管理系统的设计与实现[J].计算机技术与发展,2014,24(07):13.
 HUANG Jing,WANG Feng,XIE Zhi-xin,et al. Design and Implementation of EAST Document Management System[J].,2014,24(12):13.
[4]侯善江[],张代远[][][]. 基于样条权函数神经网络P2P流量识别方法[J].计算机技术与发展,2014,24(07):21.
 HOU Shan-jiang[],ZHANG Dai-yuan[][][]. P2P Traffic Identification Based on Spline Weight Function Neural Network[J].,2014,24(12):21.
[5]李璨,耿国华,李康,等. 一种基于三维模型的文物碎片线图生成方法[J].计算机技术与发展,2014,24(07):25.
 LI Can,GENG Guo-hua,LI Kang,et al. A Method of Obtaining Cultural Debris’ s Line Chart Based on Three-dimensional Model[J].,2014,24(12):25.
[6]翁鹤,皮德常. 混沌RBF神经网络异常检测算法[J].计算机技术与发展,2014,24(07):29.
 WENG He,PI De-chang. Chaotic RBF Neural Network Anomaly Detection Algorithm[J].,2014,24(12):29.
[7]刘茜[],荆晓远[],李文倩[],等. 基于流形学习的正交稀疏保留投影[J].计算机技术与发展,2014,24(07):34.
 LIU Qian[],JING Xiao-yuan[,LI Wen-qian[],et al. Orthogonal Sparsity Preserving Projections Based on Manifold Learning[J].,2014,24(12):34.
[8]尚福华,李想,巩淼. 基于模糊框架-产生式知识表示及推理研究[J].计算机技术与发展,2014,24(07):38.
 SHANG Fu-hua,LI Xiang,GONG Miao. Research on Knowledge Representation and Inference Based on Fuzzy Framework-production[J].,2014,24(12):38.
[9]叶偲,李良福,肖樟树. 一种去除运动目标重影的图像镶嵌方法研究[J].计算机技术与发展,2014,24(07):43.
 YE Si,LI Liang-fu,XIAO Zhang-shu. Research of an Image Mosaic Method for Removing Ghost of Moving Targets[J].,2014,24(12):43.
[10]余松平[][],蔡志平[],吴建进[],等. GSM-R信令监测选择录音系统设计与实现[J].计算机技术与发展,2014,24(07):47.
 YU Song-ping[][],CAI Zhi-ping[] WU Jian-jin[],GU Feng-zhi[]. Design and Implementation of an Optional Voice Recording System Based on GSM-R Signaling Monitoring[J].,2014,24(12):47.

更新日期/Last Update: 2015-04-15