[1]任荣梓,高航. 基于反馈合并的中英文混排版面OCR技术研究[J].计算机技术与发展,2017,27(03):39-43.
 REN Rong-zi,GAO Hang. Investigation on Layout Analysis Technology of Chinese and English Mixed OCR Based on Feedback Merging[J].,2017,27(03):39-43.
点击复制

 基于反馈合并的中英文混排版面OCR技术研究()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
27
期数:
2017年03期
页码:
39-43
栏目:
智能、算法、系统工程
出版日期:
2017-03-10

文章信息/Info

Title:
 Investigation on Layout Analysis Technology of Chinese and English Mixed OCR Based on Feedback Merging
文章编号:
1673-629X(2017)03-0039-05
作者:
 任荣梓高航
 南京航空航天大学 计算机科学与技术学院
Author(s):
 REN Rong-ziGAO Hang
关键词:
 文字识别中英混排版面分析分离
Keywords:
 character recognitionEnglish and Chinese mixedlayout analysisseparation
分类号:
TP301
文献标志码:
A
摘要:
 迄今,光学字符识别(OCR)技术已普遍应用于社会生活的方方面面,单一字符集OCR技术领域已经取得重大突破.但由于中文和英文版面分析之间存在的明显差异,现有中英文混排OCR技术的表现均不尽如人意.针对传统OCR方法实现方式的缺点和不足,在研究中英文混合版面分析切分技术难点的基础上,提出了一种改进的基于反馈合并的中英文混合版面分析切分方法.该方法在综合应用Canny算子的图像二值化方法和中值滤波法进行滤波预处理的基础上,采用投影法两次分割字符区域,并对具体切分技巧进行了较为深入的研究.对比验证实验结果表明,所提出的版面分析切分方法可成功分离中英文混合文档中的中文、英文和数字字符,正确率比传统方法高出约8个百分点,可达到97%,较好地解决了传统方法对粘连字符处理效果不佳的问题.
Abstract:
 So far,Optical Character Recognition ( OCR) technology has been widely applied in all aspects of social life,and a single char-acter set OCR has made a major breakthrough in the technology field. However,due to the obvious differences between Chinese and Eng-lish layout analysis,the performance of the existing English and Chinese mixed OCR technology is not satisfactory. According to the shortcomings and deficiencies of traditional OCR method,on the basis of the analysis of the segmentation technique difficulties in the study of Chinese and English mixed layout,an improved segmentation method of Chinese and English mixed layout OCR analysis based on feedback merging is proposed. Based on the comprehensive utilization of the Canny operator image binary method and median filter method for filter preprocessing,this method segments the character region twice by projection method,and has conducted the thorough re-search to the specific segmentation techniques. Experiment results show that the proposed method can be successfully separated in mixed document in Chinese,English and numeric characters. The correct rate is higher than the traditional method about 8 percentage points, which can reach 97%,effectively solving the problem of ineffective adhesion character for the traditional methods.

相似文献/References:

[1]张志宏,吴庆波,邵立松,等.基于飞腾平台TOE协议栈的设计与实现[J].计算机技术与发展,2014,24(07):1.
 ZHANG Zhi-hong,WU Qing-bo,SHAO Li-song,et al. Design and Implementation of TCP/IP Offload Engine Protocol Stack Based on FT Platform[J].,2014,24(03):1.
[2]梁文快,李毅. 改进的基因表达算法对航班优化排序问题研究[J].计算机技术与发展,2014,24(07):5.
 LIANG Wen-kuai,LI Yi. Research on Optimization of Flight Scheduling Problem Based on Improved Gene Expression Algorithm[J].,2014,24(03):5.
[3]黄静,王枫,谢志新,等. EAST文档管理系统的设计与实现[J].计算机技术与发展,2014,24(07):13.
 HUANG Jing,WANG Feng,XIE Zhi-xin,et al. Design and Implementation of EAST Document Management System[J].,2014,24(03):13.
[4]侯善江[],张代远[][][]. 基于样条权函数神经网络P2P流量识别方法[J].计算机技术与发展,2014,24(07):21.
 HOU Shan-jiang[],ZHANG Dai-yuan[][][]. P2P Traffic Identification Based on Spline Weight Function Neural Network[J].,2014,24(03):21.
[5]李璨,耿国华,李康,等. 一种基于三维模型的文物碎片线图生成方法[J].计算机技术与发展,2014,24(07):25.
 LI Can,GENG Guo-hua,LI Kang,et al. A Method of Obtaining Cultural Debris’ s Line Chart Based on Three-dimensional Model[J].,2014,24(03):25.
[6]翁鹤,皮德常. 混沌RBF神经网络异常检测算法[J].计算机技术与发展,2014,24(07):29.
 WENG He,PI De-chang. Chaotic RBF Neural Network Anomaly Detection Algorithm[J].,2014,24(03):29.
[7]刘茜[],荆晓远[],李文倩[],等. 基于流形学习的正交稀疏保留投影[J].计算机技术与发展,2014,24(07):34.
 LIU Qian[],JING Xiao-yuan[,LI Wen-qian[],et al. Orthogonal Sparsity Preserving Projections Based on Manifold Learning[J].,2014,24(03):34.
[8]尚福华,李想,巩淼. 基于模糊框架-产生式知识表示及推理研究[J].计算机技术与发展,2014,24(07):38.
 SHANG Fu-hua,LI Xiang,GONG Miao. Research on Knowledge Representation and Inference Based on Fuzzy Framework-production[J].,2014,24(03):38.
[9]叶偲,李良福,肖樟树. 一种去除运动目标重影的图像镶嵌方法研究[J].计算机技术与发展,2014,24(07):43.
 YE Si,LI Liang-fu,XIAO Zhang-shu. Research of an Image Mosaic Method for Removing Ghost of Moving Targets[J].,2014,24(03):43.
[10]余松平[][],蔡志平[],吴建进[],等. GSM-R信令监测选择录音系统设计与实现[J].计算机技术与发展,2014,24(07):47.
 YU Song-ping[][],CAI Zhi-ping[] WU Jian-jin[],GU Feng-zhi[]. Design and Implementation of an Optional Voice Recording System Based on GSM-R Signaling Monitoring[J].,2014,24(03):47.
[11]陈梓洋,王宇飞,钱侃,等. 自然场景下基于区域检测的文字识别算法[J].计算机技术与发展,2015,25(07):230.
 CHEN Zi-yang,WANG Yu-fei,QIAN Kan,et al. Character Recognition Algorithm Based on Region Detection in Natural Scene[J].,2015,25(03):230.

更新日期/Last Update: 2017-05-12