[1]郭耀木,刘 鹏,孙源乐,等.基于不平衡社交媒体文本的抑郁症检测方法[J].计算机技术与发展,2024,34(04):153-161.[doi:10. 3969 / j. issn. 1673-629X. 2024. 04. 023]
 GUO Yao-mu,LIU Peng,SUN Yuan-le,et al.A Detection Method for Depression Based on Imbalanced Social Media Text[J].,2024,34(04):153-161.[doi:10. 3969 / j. issn. 1673-629X. 2024. 04. 023]
点击复制

基于不平衡社交媒体文本的抑郁症检测方法()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
34
期数:
2024年04期
页码:
153-161
栏目:
人工智能
出版日期:
2024-04-10

文章信息/Info

Title:
A Detection Method for Depression Based on Imbalanced Social Media Text
文章编号:
1673-629X(2024)04-0153-09
作者:
郭耀木刘 鹏孙源乐白其炜张少华刘 建*
合肥工业大学 计算机与信息学院,安徽 宣城 242000
Author(s):
GUO Yao-muLIU PengSUN Yuan-leBAI Qi-weiZHANG Shao-huaLIU Jian*
School of Computer and Information,Hefei University of Technology,Xuancheng 242000,China
关键词:
不平衡数据集抑郁检测集成学习文本分类社交媒体文本数据
Keywords:
αimbalanced datasetdepression detectionensemble learningtext classificationsocial media text dataα
分类号:
TP399
DOI:
10. 3969 / j. issn. 1673-629X. 2024. 04. 023
摘要:
针对目前基于社交媒体数据的抑郁症检测模型难以适应不平衡数据和评估指标不全面的问题,提出一种基于文档自适应增强 Bagging- τSS3(Document Adaptive Enhanced?
Bagging- τSS3,DAEB- τSS3) 模型的社交媒体文本数据抑郁检测方法和一种新的机器学习评价指标 GF( α,β ) -Score。 在?τ -SS3 模型基础上引入置信度加权处理,增强少数
类数据影响;同时,采用文档自适应增强 Bagging 方法进行集成学习,改进 Bagging 的随机采样为分层采样并对少数类数据文档进行自适应增强以提升模型适应不平衡数据的
能力;最后在模型评价阶段,使用 GF-Score 进行自动参数选择,丢弃表现不佳的基学习器,提升模型的可信度和稳定性。 在 E-Risk2017 抑郁症检测数据集上的实验结果表明,
DAEB- τSS3 有更强的适应不平衡数据集的能力,相较于 τSS3、双向长短时记忆网络和 ERNIE 3. 0 等模型有显著性能提升,GF-Score、F1 -Score 和 G-Mean Score 平均
提升 13% ,0. 7% 和 26. 9% ,可以更加有效地实现基于不平衡社交媒体文本的抑郁症检测。
Abstract:
To address the challenges faced by the current depression detection model based on social media data,such as difficulties inhandling imbalanced data and incomplete evaluation indicators,we propose a new approach called Document Adaptive Enhanced Bagging-τSS3 ( DAEB - τSS3 ). This method utilizes social media text data for depression detection and introduces a novel machine learningevaluation metric called GF(?α,β ) -Score. Building upon the?τ -SS3 model,we incorporate confidence weighting to amplify the influenceof certain data types. Additionally,we employ the Bagging method to enhance integrated learning,improving the sampling process fromrandom sampling to layered sampling. This adaptive enhancement focuses on a select number of data documents,thereby improving themodel’s ability to handle imbalanced data. In the model evaluation stage,we utilize GF - Score for automatic parameter selection anddiscard underperforming base learners,thereby enhancing the model’s reliability and stability. Experimental results on the E-Risk2017depression detection dataset demonstrate that DAEB- τSS3 exhibits superior adaptability to imbalanced datasets and outperforms τSS3,bidirectional long-term memory networks,and ERNIE 3. 0 models. The average improvements in GF - Score,F1 - Score, and G - MeanScore are 13% ,0. 7% ,and 26. 9% ,respectively,enabling more effective depression detection based on imbalanced social media texts.

相似文献/References:

[1]吴敏,张化朋,李雷.欠抽样和DEC相结合的不平衡数据分类算法[J].计算机技术与发展,2014,24(04):110.
 U Min,ZHANG Hua-peng,LI Lei.Classification Algorithm for Imbalanced Datasets Combined Undersampling with DEC[J].,2014,24(04):110.
[2]陈斌,苏一丹,黄山. 基于KM-SMOTE和随机森林的不平衡数据分类[J].计算机技术与发展,2015,25(09):17.
 CHEN Bin,SU Yi-dan,HUANG Shan. Classification of Imbalance Data Based on KM-SMOTE Algorithm and Random Forest[J].,2015,25(04):17.
[3]刘凌,郭剑,韩崇. 面向不平衡数据的模糊支持向量机[J].计算机技术与发展,2015,25(11):38.
  Fuzzy Support Vector Machine for Imbalanced Data[J].,2015,25(04):38.
[4]刘耀杰,刘独玉.基于不平衡数据集的改进随机森林算法研究[J].计算机技术与发展,2019,29(06):100.[doi:10. 3969 / j. issn. 1673-629X. 2019. 06. 021]
 LIU Yao-jie,LIU Du-yu.Research on Improved Random Forest Algorithm Based on Unbalanced Datasets[J].,2019,29(04):100.[doi:10. 3969 / j. issn. 1673-629X. 2019. 06. 021]
[5]王 诚,高兴东.基于最小生成树的密度聚类算法研究[J].计算机技术与发展,2022,32(02):45.[doi:10. 3969 / j. issn. 1673-629X. 2022. 02. 007]
 WANG Cheng,GAO Xing-dong.Research on Density Clustering Algorithm Based on MST[J].,2022,32(04):45.[doi:10. 3969 / j. issn. 1673-629X. 2022. 02. 007]

更新日期/Last Update: 2024-04-10