[1]胡文瑜,应康辉*.实例层数据清洗技术研究[J].计算机技术与发展,2022,32(05):22-28.[doi:10. 3969 / j. issn. 1673-629X. 2022. 05. 004]
 HU Wen-yu,YING Kang-hui*.Study of Instance-level Data Cleaning Technology[J].,2022,32(05):22-28.[doi:10. 3969 / j. issn. 1673-629X. 2022. 05. 004]
点击复制

实例层数据清洗技术研究()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
32
期数:
2022年05期
页码:
22-28
栏目:
大数据分析与挖掘
出版日期:
2022-05-10

文章信息/Info

Title:
Study of Instance-level Data Cleaning Technology
文章编号:
1673-629X(2022)05-0022-07
作者:
胡文瑜12 应康辉12*
1. 福建工程学院 计算机科学与数学学院,福建 福州 350118;
2. 福建省大数据挖掘与应用技术重点实验室,福建 福州 350118
Author(s):
HU Wen-yu12 YING Kang-hui12*
1. School of Computer Science and Mathematics,Fujian University of Technology,Fuzhou 350118,China;
2. Fujian Provincial Key Laboratory of Big Data Mining and Applications,Fuzhou 350118,China
关键词:
实例层数据清洗属性检测属性清洗重复记录检测重复记录清洗
Keywords:
instance-level data cleaningattribute detectionattribute cleaningrepeated record detectionrepeated record cleaning
分类号:
TP309
DOI:
10. 3969 / j. issn. 1673-629X. 2022. 05. 004
摘要:
随着科学、技术和工程的迅猛发展,近 20 年来,许多领域诸如光学观测、光学监控、健康医护、传感器、用户数据、互联网和金融公司以及供应链系统等都产生了海量的数据( 例如,在医疗检测中,数据都是源源不断而来的,形成了“ 数据灾难”) 。 有效的数据分析和数据挖掘建立在数据可用性和数据高质量的基础上,数据高质量的前提是需要对数据进行清洗。 数据清洗是对脏数据进行检测和纠正的过程,是进行数据分析和管理的基础,也是常用的提高数据质量的技术。 实例层数据清洗是数据清洗的重要组成部分,该文重点对实例层数据清洗技术中属性和重复记录值的检测及清洗方法进行比较和分析总结。 介绍了数据清洗技术以电气工程领域、医药领域、交通领域为代表的应用领域结合应用情况,对不同的数据集特点与适用的实例层数据清洗技术提供了有价值的选择建议。 最后对实例层数据清洗技术面临的问题与挑战及发展方向进行了展望。
Abstract:
With the rapid development of science, technology and engineering, in the past 20 years, many fields such as optical observation,optical monitoring,health care,sensors,user data,Internet and financial companies, and supply chain systems have produced massive amounts of data? ? ?( For example, in medical testing, data is constantly coming in, forming a " data disaster" ) . Effective data analysis and data mining are based? ?on data availability and data high quality. The premise of data high quality is the need to clean the data. Data cleaning is the process of detecting and correcting dirty data, is the basis for data analysis and management,and is also a commonly used technology to improve data quality. Instance-level data cleaning is an important part of data cleaning. We focus on comparing,analyzing and summarizing the detection and cleaning methods of attributes and repeated record values in the instance-level data cleaning technology,and introduce the combined application of data cleaning technology represented by the electrical engineering field,the medical field,and the transportation field,and provide valuable selection suggestions for the characteristics of different data sets and the applicable instance-level data cleaning technology. Finally,the problems,challenges and development directions of the instance-level data cleaning technology are prospected.
更新日期/Last Update: 2022-05-10