[1]李天举,谢志峰,张侃弘,等.基于集成学习的烟草异常数据挖掘研究与应用[J].计算机技术与发展,2020,30(11):128-135.[doi:10. 3969 / j. issn. 1673-629X. 2020. 11. 024]
 LI Tian-ju,XIE Zhi-feng,ZHANG Kan-hong,et al.Study and Application of Tobacco Anomaly Data Mining Based on Ensemble Learning[J].,2020,30(11):128-135.[doi:10. 3969 / j. issn. 1673-629X. 2020. 11. 024]
点击复制

基于集成学习的烟草异常数据挖掘研究与应用()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
30
期数:
2020年11期
页码:
128-135
栏目:
应用开发研究
出版日期:
2020-11-10

文章信息/Info

Title:
Study and Application of Tobacco Anomaly Data Mining Based on Ensemble Learning
文章编号:
1673-629X(2020)11-0128-08
作者:
李天举1谢志峰1张侃弘2陶亦筠3范 杰2汤 臻3
1. 上海大学,上海 200072; 2. 上海烟草集团有限责任公司,上海 200082; 3. 上海市烟草专卖局,上海 200120
Author(s):
LI Tian-ju1XIE Zhi-feng1ZHANG Kan-hong2TAO Yi-jun3FAN Jie2TANG Zhen3
1. Shanghai University,Shanghai 200072,China; 2. Shanghai Tobacco Group Co. ,Ltd. ,Shanghai 200082,China; 3. Shanghai Tobacco Monopoly Administration,Shanghai 200120,China
关键词:
异常数据挖掘集成学习数据预处理数据增强Stacking 模型
Keywords:
abnormal data miningensemble learningdata preprocessingdata augmentationStacking model
分类号:
TP399
DOI:
10. 3969 / j. issn. 1673-629X. 2020. 11. 024
摘要:
为了推动上海市烟草专卖市场监管方式转型, 实现市场监管水平的有效提升, 通过引入异常数据挖掘方法, 从而强化市场异动预测和分析。结合目前机器学习前沿理论的研究, 提出了基于多模型 Stacking 集成学习的烟草异常数据挖掘模型,运用 Stacking 集成学习的方式,充分发挥各个算法模型的优势。数据集使用的是2016年1月到2019年4月的烟草专卖数据,通过数据预处理等方式将数据指标化,并使用数据增强等手段一定程度上缓解了数据不平衡的问题。 使用该数据对模型进行了验证分析,其结果很好地证明了 Stacking 模型中单个机器学习算法的学习能力越强,关联程度越低,集成后的模型预测结果越好。 最后通过实证稽查环节,充分验证了模型的有效性,经过全市实证后,市场检查对零售户的问题查实率能从现有的 5% 左右提升至 15% 以上。
Abstract:
In order to promote the transformation of the Shanghai tobacco monopoly market supervision method and achieve an effective improvement in the level of market supervision,the introduction of abnormal data mining methods has strengthened the prediction and analysis of market movements. Combined with the current research on cutting-edge theories of machine learning,a tobacco anomaly data mining model based on multi-model Stac-king ensemble learning is proposed,and the advantages of each algorithm model are brought into full play by using Stacking ensemble learning. The data set uses tobacco monopoly data from January 2016 to April 2019. The data is indexed through data preprocessing and other methods,and data enhancement is used to alleviate the problem of data imbalance to some extent. The model is verified and analyzed by these data. The results well prove that the stronger the learning ability of a single machine learning algorithm in the Stacking model,the lower the degree of associa-tion,and the better the prediction result of the integrated model. Finally,the effectiveness of the model is fully verified through the empirical inspection link. After the city,s empirical verification,the market inspection of the retailer爷s problem verification rate can be increased from the existing 5% to more than 15% .

相似文献/References:

[1]陈全 赵文辉 李洁 江雨燕.选择性集成学习算法的研究[J].计算机技术与发展,2010,(02):87.
 CHEN Quan,ZHAO Wen-hui,LI Jie,et al.Research of Selective Ensemble Learning Algorithm[J].,2010,(11):87.
[2]贾瑞玉 冯伦阔 李永顺 张新建.基于集成学习的覆盖算法[J].计算机技术与发展,2009,(07):76.
 JIA Rui-yu,FENG Lun-kuo,LI Yong-shun,et al.Cover Algorithm Based on Ensemble Learning[J].,2009,(11):76.
[3]姚明海,赵连朋,刘维学.基于特征选择的Bagging分类算法研究[J].计算机技术与发展,2014,24(04):103.
 YAO Ming-hai,ZHAO Lian-peng,LIU Wei-xue.Research on Bagging Classification Algorithm Based on Feature Selection[J].,2014,24(11):103.
[4]周丰,王未央. 基于最小最大模块化集成特征选择的改进[J].计算机技术与发展,2016,26(09):149.
 ZHOU Feng,WANG Wei-yang. Improvement of Multi-classification Integrated Selection Based on Min-Max-Module[J].,2016,26(11):149.
[5]黄 琳,荆晓远,董西伟.基于多核集成学习的跨项目软件缺陷预测[J].计算机技术与发展,2019,29(06):27.[doi:10. 3969 / j. issn. 1673-629X. 2019. 06. 006]
 HUANG Lin,JING Xiao-yuan,DONG Xi-wei.Cross-project Software Defect Prediction Based on Multiple Kernel Ensemble Learning[J].,2019,29(11):27.[doi:10. 3969 / j. issn. 1673-629X. 2019. 06. 006]
[6]李田港,叶 硕,叶光明,等.基于集成学习的语音情感识别算法研究[J].计算机技术与发展,2020,30(06):82.[doi:10. 3969 / j. issn. 1673-629X. 2020. 06. 016]
 LI Tian-gang,YE Shuo,YE Guang-ming,et al.Research on Speech Emotion Recognition Algorithm Based on Ensemble Learning[J].,2020,30(11):82.[doi:10. 3969 / j. issn. 1673-629X. 2020. 06. 016]
[7]郭 晨,陈 龙.基于机器学习方法的数字岩芯电导率预测[J].计算机技术与发展,2020,30(07):100.[doi:10. 3969 / j. issn. 1673-629X. 2020. 07. 022]
 GUO Chen,CHEN Long.Prediction of Digital Core Electrical Conductivity Using Machine Learning Method[J].,2020,30(11):100.[doi:10. 3969 / j. issn. 1673-629X. 2020. 07. 022]
[8]张银杰,揣锦华,翟晓惠.基于集成学习算法的恶意软件感染二分类预测[J].计算机技术与发展,2021,31(05):15.[doi:10. 3969 / j. issn. 1673-629X. 2021. 05. 003]
 ,BinaryPredictionofMalwareInfectionBasedonIntegratedLearningAlgorithm[J].,2021,31(11):15.[doi:10. 3969 / j. issn. 1673-629X. 2021. 05. 003]
[9]张照鑫,朱允刚,虞玉峰,等.基于贝叶斯网和集成学习的智能电表状态评价[J].计算机技术与发展,2021,31(06):146.[doi:10. 3969 / j. issn. 1673-629X. 2021. 06. 026]
 ZHANG Zhao-xin,ZHU Yun-gang,YU Yu-feng,et al.State Evaluation of Smart Energy Meter Based on BayesianNetwork and Integrated Learning[J].,2021,31(11):146.[doi:10. 3969 / j. issn. 1673-629X. 2021. 06. 026]
[10]肖 梁,韩 璐,魏鹏飞,等.基于 Bagging 集成学习的多集类不平衡学习[J].计算机技术与发展,2021,31(10):1.[doi:10. 3969 / j. issn. 1673-629X. 2021. 10. 001]
 XIAO Liang,HAN Lu,WEI Peng-fei,et al.Bagging Ensemble Learning Based Multiset Class-imbalanced Learning[J].,2021,31(11):1.[doi:10. 3969 / j. issn. 1673-629X. 2021. 10. 001]

更新日期/Last Update: 2020-11-10