高级检索

  • ISSN 1006-3080
  • CN 31-1691/TQ

基于McDiarmid边界的自适应加权概念漂移检测方法

胡阳 孙自强

胡阳, 孙自强. 基于McDiarmid边界的自适应加权概念漂移检测方法[J]. 华东理工大学学报(自然科学版). doi: 10.14135/j.cnki.1006-3080.20211215002
引用本文: 胡阳, 孙自强. 基于McDiarmid边界的自适应加权概念漂移检测方法[J]. 华东理工大学学报(自然科学版). doi: 10.14135/j.cnki.1006-3080.20211215002
HU Yang, SUN Ziqiang. Adaptive Weighted Concept Drift Detection Method Based on McDiarmid Boundary[J]. Journal of East China University of Science and Technology. doi: 10.14135/j.cnki.1006-3080.20211215002
Citation: HU Yang, SUN Ziqiang. Adaptive Weighted Concept Drift Detection Method Based on McDiarmid Boundary[J]. Journal of East China University of Science and Technology. doi: 10.14135/j.cnki.1006-3080.20211215002

基于McDiarmid边界的自适应加权概念漂移检测方法

doi: 10.14135/j.cnki.1006-3080.20211215002
详细信息
    作者简介:

    胡阳:胡 阳(1996—),男,江西吉安人,硕士生,主要研究方向:数据流挖掘、概念漂移检测。E-mail:775115091@qq.com

    通讯作者:

    孙自强, E-mail:sunziqiang@ecust.edu.cn

  • 中图分类号: TP391.4

Adaptive Weighted Concept Drift Detection Method Based on McDiarmid Boundary

  • 摘要: 针对概念漂移主动检测方法检测延迟高,易出现漏检、误报的问题,提出了一种基于McDiarmid边界的自适应加权概念漂移检测方法。引入衰减函数对分类结果加权,赋予旧数据更低权值,提升新数据的影响力。利用McDiarmid不等式得到加权分类正确率的置信边界,在检测到分类正确率下降超过置信边界时调节衰减因子时,实现权值的动态改变。实验主要与DDM(Drift Detection Method)、RDDM(Reactive Drift Detection Method)、HDDM(Drift Detection Method based on the Hoeffding's inequality)、FHDDM(Fast Hoeffding Drift Detection Method)和窗口(ADWIN)算法对比,结果表明,该算法具有最低的误报率和漏检率,且平均检测延迟和正确率在6种算法中排前2。

     

  • 图  1  实漂移和虚漂移概念图

    Figure  1.  Conceptual diagram of real drift and virtual drift

    图  2  概念漂移类型图

    Figure  2.  Concept drift type diagram

    图  3  加权窗口描述图

    Figure  3.  Weighted window description diagram

    图  4  WMDDM算法流程图

    Figure  4.  WMDDM algorithm flow chart

    图  5  检测点示意图

    Figure  5.  Schematic diagram of detection points

    图  6  Electricity数据集上的分类准确率对比图

    Figure  6.  Comparison chart of classification accuracy on the Electricity dataset

    表  1  数据集特征表

    Table  1.   Data set feature table

    Data setInstancesAttributesCategoryNumber of driftsNoise rateConcept lengthDrift type
    SINE100 00022410%20 000Sudden
    MIXED100 00042410%20 000Sudden
    LED100 0002410310%25 000Gradual
    CIRCLE100 00022310%25 000Gradual
    Electricity45 31272UnknownUnknownUnknownUnknown
    下载: 导出CSV

    表  2  实验1结果

    Table  2.   Results of experiment 1

    DetectorFPR/%FNR/%ADODCorrect rate/%
    WMDDM0037.2585.34
    WMDDM#180044.0085.32
    FHDDM0044.0085.32
    下载: 导出CSV

    表  3  实验2结果

    Table  3.   Results of experiment 2

    DetectorFPR/%FNR/%ADODCorrect rate/%
    WMDDM0069.6786.25
    WMDDM#181.25049.0086.26
    FHDDM0074.0086.24
    下载: 导出CSV

    表  4  在SINE数据集上的实验结果

    Table  4.   Experimental results on the SINE dataset

    ClassifierDetectorTPFPFNFPR/%FNR/%ADODCorrect rate/%
    HTWMDDM41020.0047.0086.36
    FHDDM41020.0049.7585.37
    HDDM41020.0034.5086.39
    DDM40000150.5086.05
    RDDM43042.90100.2586.08
    ADWIN427087.1064.2584.09
    NBWMDDM4000037.2585.34
    FHDDM4000044.0085.32
    HDDM4000033.2585.36
    DDM40000152.7585.06
    RDDM40000101.0085.17
    ADWIN46060.0067.0084.72
    下载: 导出CSV

    表  5  MIXED数据集上的实验结果

    Table  5.   Experimental results on the MIXED dataset

    ClassifierDetectorTPFPFNFPR/%FNR/%ADODCorrect rate/%
    HTWMDDM42033.3041.0085.98
    FHDDM47063.6041.0085.24
    HDDM45859.3035.7585.38
    DDM411073.30130.7584.27
    RDDM49069.2088.2585.91
    ADWIN49069.2076.0084.82
    NBWMDDM4000043.5086.62
    FHDDM4000045.5086.62
    HDDM4000033.7586.60
    DDM40000149.0086.26
    RDDM40000103.0086.41
    ADWIN46060.0067.5086.03
    下载: 导出CSV

    表  6  LED数据集上的实验结果

    Table  6.   Experimental results on the LED dataset

    ClassifierDetectorTPFPFNFPR/%FNR/%ADODCorrect rate/%
    HTWMDDM30000237.6789.68
    FHDDM30000256.0089.64
    HDDM21033.30367.6789.61
    DDM033100.0100.0400.089.53
    RDDM22150.033.3380.6789.59
    ADWIN02663100.0100.0400.087.50
    NBWMDDM30000237.6789.68
    FHDDM30000256.0089.67
    HDDM21133.333.3367.6789.61
    DDM033100.0100.0400.089.53
    RDDM22150.033.3380.6789.59
    ADWIN02663100.0100.0400.087.50
    下载: 导出CSV

    表  7  CIRCLE数据集上的实验结果

    Table  7.   Experimental results on the CIRCLE dataset

    ClassifierDetectorTPFPFNFPR/%FNR/%ADODCorrect rate/%
    HTWMDDM3000058.3387.18
    FHDDM3000061.3387.16
    HDDM3000044.3387.19
    DDM21133.333.3332.6786.97
    RDDM31025.00246.3387.03
    ADWIN34057.1093.6786.74
    NBWMDDM3000069.6786.25
    FHDDM3000074.0086.24
    HDDM3000041.3386.26
    DDM21133.333.3325.3386.06
    RDDM30000225.0086.17
    ADWIN32040.00113.6786.20
    下载: 导出CSV
  • [1] BARROS R S M, SANTOS S G T C. A large-scale comparison of concept drift detectors[J]. Information Sciences, 2018, 451(4): 348-370.
    [2] YAN M. Accurate detecting concept drift in evolving data streams[J]. ICT Express, 2020, 6(4): 332-338. doi: 10.1016/j.icte.2020.05.011
    [3] 任思琪. 基于概念漂移的数据流集成分类算法研究[D]. 长沙: 湖南大学, 2018.
    [4] YU H, LIU T Y, LU J, et al. Automatic Learning to Detect Concept Drift[EB/OL]. (2021-04-04) [2021-11-15]. https://arxiv.org/abs/2105.01419.
    [5] 周志华. 机器学习[M]. 北京: 清华大学出版社, 2016.
    [6] GAMA J, MEDAS P, CASTILLO G, et al. Learning with drift detection[C]//Brazilian Symposium on Artificial Intelligence. Berlin, Heidelberg: Springer, 2004: 286-295.
    [7] BARROS R, CABRAL D, GONCALVES P M, et al. RDDM: Reactive drift detection method[J]. Expert Systems with Applications, 2017, 90(8): 344-355.
    [8] FRIAS-BLANCO I, DEL CAMPO-ÁVILA J, RAMOS-JIMENEZ G, et al. Online and non-parametric drift detection methods based on Hoeffding’s bounds[J]. IEEE Transactions on Knowledge and Data Engineering, 2014, 27(3): 810-823.
    [9] PESARANGHADER A, VIKTOR H L. Fast Hoeffding drift detection method for evolving data streams[C]//Joint European Conference on Machine Learning & Knowledge Discovery in Databases. Cham, Switzerland: Springer, 2016: 286-295.
    [10] 李静林, 袁泉. 流数据分析技术[M]. 北京: 北京邮电大学出版社, 2020.
    [11] MANAPRAGADA C, WEBB G I, SALEHI M. Extremely fast decision tree[C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. London: ACM. 2018: 1953-1962.
    [12] 邬东辉, 顾幸生. 基于自适应稀疏表示和保局投影的工业故障检测[J]. 华东理工大学学报(自然科学版), 2021, 47(4): 455-464.
    [13] 赵刚, 王梦灵, 薛斌强, 等. 基于自适应事件触发的交通网络预测控制[J]. 华东理工大学学报(自然科学版), 2021, 47(3): 316-322.
    [14] RUTKOWSKI, LESZEK, PIETRUCZUK, et al. Decision trees for mining data streams based on the McDiarmid's bound[J]. IEEE Transactions on Knowledge and Data Engineering, 2013, 25(6): 1272-1279. doi: 10.1109/TKDE.2012.66
    [15] RIO E. On McDiarmid's concentration inequality[J]. Electronic Communications in Probability, 2013, 18: 1-11.
    [16] BIFET A, GAVALDA R. Learning from time-changing data with adaptive windowing[C]//Proceedings of the 2007 SIAM international conference on data mining. Minneapolis: SIAM, 2007: 443-448.
    [17] SRIMANI P K, PATIL M M. Mining data streams with concept drift in massive online analysis frame work[J]. Procedia Computer Science, 2015, 15(3): 133-142.
  • 加载中
图(6) / 表(7)
计量
  • 文章访问数:  110
  • HTML全文浏览量:  114
  • PDF下载量:  18
  • 被引次数: 0
出版历程
  • 收稿日期:  2021-12-15
  • 网络出版日期:  2022-04-12

目录

    /

    返回文章
    返回