高级检索

  • ISSN 1006-3080
  • CN 31-1691/TQ

基于多行为交互的变维协同进化特征选择方法

李腾飞 冯翔 虞慧群

李腾飞, 冯翔, 虞慧群. 基于多行为交互的变维协同进化特征选择方法[J]. 华东理工大学学报(自然科学版). doi: 10.14135/j.cnki.1006-3080.20201207001
引用本文: 李腾飞, 冯翔, 虞慧群. 基于多行为交互的变维协同进化特征选择方法[J]. 华东理工大学学报(自然科学版). doi: 10.14135/j.cnki.1006-3080.20201207001
LI Tengfei, FENG Xiang, YU Huiqun. Co-Evolutionary Feature Selection Algorithm Based on Variable-Length Particle and Multi-Behavior Interaction[J]. Journal of East China University of Science and Technology. doi: 10.14135/j.cnki.1006-3080.20201207001
Citation: LI Tengfei, FENG Xiang, YU Huiqun. Co-Evolutionary Feature Selection Algorithm Based on Variable-Length Particle and Multi-Behavior Interaction[J]. Journal of East China University of Science and Technology. doi: 10.14135/j.cnki.1006-3080.20201207001

基于多行为交互的变维协同进化特征选择方法

doi: 10.14135/j.cnki.1006-3080.20201207001
基金项目: 国家自然科学基金(61772200;61772201,61602175);上海市浦江人才计划(17PJ1401900);上海市经信委“信息化发展专项资金”(201602008)
详细信息
    作者简介:

    李腾飞(1996-):硕士生,主要研究方向为演化计算、人工智能。E-mail:tyrandesal@163.com

    通讯作者:

    冯 翔,E-mail:xfeng@ecust.edu.cn

  • 中图分类号: TP18

Co-Evolutionary Feature Selection Algorithm Based on Variable-Length Particle and Multi-Behavior Interaction

  • 摘要: 针对大规模数据集上的特征选择问题,一种变长表示的粒子群特征选择方法(VLPSO)表现出了良好的性能。然而,其完全随机的粒子生成方式导致初始化阶段具有一定的盲目性。同时,VLPSO单一的更新机制和种群间的信息隔离也影响了模型的分类性能。为了解决VLPSO的缺陷,提出了一种基于多行为交互的变维协同进化特征选择方法(M-CVLPSO)。首先,为了改善随机初始化带来的盲目性,采用连续空间上的层次初始化策略,从期望上缩短了初始解与最优解之间的距离。其次,将粒子根据适应度分为领导者、追随者与淘汰者,在迭代过程中采用多种更新策略动态平衡算法的多样性与收敛性。同时,将维度缩减指标加入到适应度函数中,进一步增强了算法在部分数据集上的性能。从理论上证明了该算法的收敛性,并基于11个大规模特征选择数据集在分类精度、维度缩减和计算时间上进行实验分析。实验结果表明,本文算法相较于4种对比算法具有更好的综合表现。

     

  • 图  1  基于CLPSO的变长粒子表示示意图

    Figure  1.  Clpso-based variable-length particle representation

    图  2  层次初始化

    Figure  2.  Hierarchical initialization

    图  3  多行为交互策略下的粒子更新示意图

    Figure  3.  Particle update under multi-behavior interaction strategy

    图  4  5种算法在11个数据集上的各项指标排名

    Figure  4.  ranking of the five algorithms on eleven data sets

    图  5  种群平均维度随迭代变化情况

    Figure  5.  The average particle dimension varies with iteration

    图  6  最优粒子适应度变化曲线

    Figure  6.  Curves of optimal particle fitness

    表  1  实验数据集

    Table  1.   Datasets

    Dataset#Features#Ins.#Class%Smallest class%Largest class
    SRBCT23088341335
    Leukemia153277231353
    Leukemia2112257232839
    DLBCL54697722575
    9Tumor5726609315
    Brain15920905467
    Brain2103675041430
    LSVT31012623366
    Lung126002035368
    Carcinom918217411315
    GLIOMA44355041430
    下载: 导出CSV

    表  2  参数设置

    Table  2.   Parameter setting

    ParameterSetting
    Population sizeFeatures/20
    Maximun iterations100
    c1=c1.49445
    w0.9−0.5×(curr_iter/max_iter)
    Threshold for selected feature0.6
    Data set partitioning10 cross-fold
    Max iterations for renew7(CLPSO,ECLPSO,VLPSO)
    Number of divisions12(M-CVLPSO,VLPSO)
    Max iterations for length changing9(M-CVLPSO,VLPSO)
    下载: 导出CSV

    表  3  平均测试结果

    Table  3.   Average test results

    AlgorithmLSVTCarcinom
    TimeSizeBestMeanTimeSizeBestMean
    M-CVLPSO0.435.586.8980.2235.9152.590.0787.78
    VLPSO0.733.283.1480.1946.2136.087.7685.38
    ECLPSO2.0138.083.6378.99311.84182.284.1381.47
    CLPSO1.7137.779.6776.49234.14157.088.9786.79
    PSO2.0141.381.2477.86360.42174.777.2073.83
    Full/310.076.80//9182.073.65/
    AlgorithmSRBCTLeukemia1Leukemia2
    TimeSizeBestMeanTimeSizeBestMeanTimeSizeBestMean
    M-CVLPSO0.863.7100.0099.704.548.798.0695.4912.129.895.5692.94
    VLPSO1.651.1100.0099.587.057.995.8392.9416.234.195.5691.66
    ECLPSO8.41051.090.8386.0841.52432.184.4482.22182.65123.290.5687.44
    CLPSO6.61056.4100.0098.9532.52435.795.5694.6588.95117.193.3391.66
    PSO9.41103.292.5088.9447.12601.587.3680.97163.85427.792.2289.67
    Full/2308.086.96//5327.079.72//11225.088.78/
    AlgorithmDLBCL9TumorBrain1
    TimeSizeBestMeanTimeSizeBestMeanTimeSizeBestMean
    M-CVLPSO4.630.994.0089.564.143.766.6758.007.333.280.4275.56
    VLPSO7.552.891.3386.516.239.958.3353.3310.826.677.0870.74
    ECLPSO47.32500.384.8381.6939.02621.445.0042.8367.42721.480.0074.17
    CLPSO37.22487.996.5093.9832.12604.458.3353.8353.72679.577.5075.12
    PSO52.72682.586.3383.5444.12787.945.0041.6771.52921.077.0873.33
    Full/5469.084.12//5726.037.23//5920.071.67/
    AlgorithmBrain2LungGLIOMA
    TimeSizeBestMeanTimeSizeBestMeanTimeSizeBestMean
    M-CVLPSO8.961.080.0069.549.6154.593.4091.032.564.8978.7573.29
    VLPSO13.481.879.5868.2927.6369.493.2091.284.3148.277.5069.91
    ECLPSO76.04703.769.5864.2478.11517.793.0791.0119.52033.980.0073.95
    CLPSO58.24706.582.0879.9562.41506.594.2592.0215.31991.182.0876.16
    PSO84.75089.067.0862.1384.71613.793.1890.4323.42207.873.5668.28
    Full/10367.062.50//3312.086.83//4434.074.50/
    下载: 导出CSV

    表  4  5种算法在特征集上的Friedman排名

    Table  4.   Average Friedman ranking of the five algorithms on the feature set

    AlgorithmTimeSizeBestMeanComprehensive
    M-CVLPSO1.001.451.451.545.44
    VLPSO2.001.552.632.919.09
    ECLPSO4.093.823.913.9115.73
    CLPSO3.003.362.092.0010.45
    PSO4.904.823.914.5418.17
    下载: 导出CSV

    表  5  加入新适应度函数的平均测试结果

    Table  5.   Mean test results with new fitness function

    ModelDLBCLBrain2
    TimeSizeBestMeanTimeSizeBestMean
    ALL-Best4.630.996.5093.988.961.082.0879.95
    I4.6 30.994.0089.568.961.080.0069.54
    II5.131.197.591.03 9.365.982.5073.63
    ModelLungGLIOMA
    TimeSizeBestMeanTimeSizeBestMean
    ALL-Best9.6154.594.2592.02 2.564.982.0876.16
    I9.6154.593.4091.032.564.978.7576.67
    II10.1149.094.2790.962.671.076.6770.04
    下载: 导出CSV

    表  6  层次初始化单轮迭代平均测试结果

    Table  6.   single-round iteration average test results

    DatasetStrategyGbfGbsAvf
    SRBCTWith0.90149.20.82
    Without0.87200.70.78
    Leukemia1With0.87221.10.78
    Without0.83209.10.74
    Leukemia2With0.83448.80.76
    Without0.83365.50.76
    DLBCLWith0.85218.40.76
    Without0.81195.40.72
    9TumorWith0.52218.90.38
    Without0.51224.50.38
    Brain1With0.71265.80.62
    Without0.67257.40.59
    Brain2With0.73689.90.65
    Without0.74892.00.65
    LungWith0.87435.70.83
    Without0.831465.70.79
    GLIOMAWith0.79531.00.72
    Without0.75601.50.68
    LSVTWith0.7544.50.67
    Without0.7143.60.63
    CarcinomWith0.83428.40.77
    Without0.80333.70.73
    下载: 导出CSV

    表  7  层次初始化策略消融实验结果

    Table  7.   Ablation experiment results with hierarchical initialization strategy

    ModelSRBCTLeukemia1
    TimeSizeBestMeanTimeSizeBestMean
    VLPSO1.652.3100.0099.676.653.195.8392.83
    I'1.451.7100.0099.257.645.598.0693.33
    ModelLeukemia2DLBCL
    TimeSizeBestMeanTimeSizeBestMean
    VLPSO16.134.393.8992.33 7.959.690.6788.03
    I'22.432.991.6790.227.652.092.5089.00
    Model9TumorBrain1
    TimeSizeBestMeanTimeSizeBestMean
    VLPSO6.039.058.3352.33 11.7 29.877.0870.08
    I'6.846.058.3354.0011.128.277.9270.59
    ModelBrain2Lung
    TimeSizeBestMeanTimeSizeBestMean
    VLPSO16.177.875.4266.75 27.6381.092.5691.41
    I'14.299.578.7573.3328.6422.094.3291.86
    ModelGLIOMALSVT
    TimeSizeBestMeanTimeSizeBestMean
    VLPSO4.3181.277.5069.41 0.732.379.6078.48
    I'4.5130.475.4270.000.737.879.1177.13
    ModelCarcinom
    TimeSizeBestMean
    VLPSO44.9145.987.7685.08
    I'54.1145.792.2988.29
    下载: 导出CSV

    表  8  多行为交互策略消融实验结果

    Table  8.   Ablation experiment results with multi-behavior interactive strategy

    ModelSRBCTLeukemia1
    TimeSizeBestMeanTimeSizeBestMean
    VLPSO1.652.3100.0099.676.653.195.8392.83
    II'0.965.6100.0099.424.247.695.8391.25
    ModelLeukemia2DLBCL
    TimeSizeBestMeanTimeSizeBestMean
    VLPSO16.134.393.8992.33 7.959.690.6788.03
    II'12.329.793.8992.006.831.692.3387.50
    Model9TumorBrain1
    TimeSizeBestMeanTimeSizeBestMean
    VLPSO6.039.058.3352.33 11.7 29.877.0870.08
    II'4.046.356.6755.677.535.475.0071.33
    ModelBrain2Lung
    TimeSizeBestMeanTimeSizeBestMean
    VLPSO16.177.875.4266.75 27.6381.092.5691.41
    II'9.163.677.9271.6710.0197.592.6189.88
    ModelGLIIOMALSVT
    TimeSizeBestMeanTimeSizeBestMean
    VLPSO4.3181.277.5069.41 0.732.379.6078.48
    II'2.358.382.9274.080.435.482.3378.69
    ModelCarcinom
    TimeSizeBestMean
    VLPSO44.9145.987.7685.08
    II'35.9148.691.0288.07
    下载: 导出CSV
  • [1] 赵鸿山, 范贵生, 虞慧群. 基于归一化文档频率的文本分类特征选择方法[J]. 华东理工大学学报(自然科学版), 2019, 45(5): 809-814.
    [2] GUYON I, ELISSEEFF A. An introduction to variable and feature selection[J]. Journal of Machine Learning Research, 2003, 3(6): 1157-1182.
    [3] DASH M, LIU H. Feature selection for classification[J]. Intelligent Data Analysis, 1997, 1(1/4): 131-156.
    [4] KENNEDY J, EBERHART R. Particle swarm optimization[C]// International Conference on Neural Networks. Australia: IEEE, 1995: 1942-1948.
    [5] XUE B, ZHANG M, BROWNE W N. Particle swarm optimisation for feature selection in classification: Novel initialisation and updating mechanisms[J]. Applied Soft Computing Journal, 2014, 18: 261-276. doi: 10.1016/j.asoc.2013.09.018
    [6] LIANG J J, QIN A K, SUGANTHAN P N, et al. Comprehensive learning particle swarm optimizer for global optimization of multimodal functions[J]. IEEE Transactions on Evolutionary Computation, 2006, 10(3): 281-295. doi: 10.1109/TEVC.2005.857610
    [7] QIAN W, HUANG J, WANG Y, et al. Label distribution feature selection for multi-label classification with rough set[J]. International Journal of Approximate Reasoning, 2021, 128: 32-55. doi: 10.1016/j.ijar.2020.10.002
    [8] ZHOU Y, LIN J, GUO H. Feature subset selection via an improved discretization-based particle swarm optimization[J]. Applied Soft Computing, 2021, 98: 106794. doi: 10.1016/j.asoc.2020.106794
    [9] HUDA R K, BANKA H. New efficient initialization and updating mechanisms in PSO for feature selection and classification[J]. Neural Computing and Applications, 2020, 32(1): 3283-3294.
    [10] FISTER D, FISTER I, JAGRI T, et al. Swarm, Evolutionary, and Memetic Computing and Fuzzy and Neural Computing[M]. [s.l.]:[s.n.], 2020: 135-154.
    [11] JI B, LU X, SUN G, et al. Bio-inspired feature selection: An improved binary particle swarm optimization approach[J]. IEEE Access, 2020, 8: 85989-86002. doi: 10.1109/ACCESS.2020.2992752
    [12] CHEN K, XUE B, ZHANG M, et al. Hybridising particle swarm optimisation with differential evolution for feature selection in classification[C]// IEEE Congress on Evolutionary Computation (CEC). UK: IEEE, 2020: 1-8.
    [13] GUAN B, ZHAO Y, YIN Y, et al. A differential evolution based feature combination selection algorithm for high-dimensional data[J]. Information Sciences, 2020, 547: 870-886.
    [14] NGUYEN H B, XUE B, LIU I, et al. PSO and statistical clustering for feature selection: a new representation[C]/ / Proceedings of the 10th International Conference on Simulated Evolution and Learning. Dunedin: Springer, 2014: 569-581.
    [15] GU S, CHENG R, JIN Y. Feature selection for high-dimensional classification using a competitive swarm optimizer[J]. Soft Computing, 2016, 22: 811-822.
    [16] TRAN B, XUE B, ZHANG M. Variable-length particle swarm optimization for feature selection on high-dimensional classification[J]. IEEE Transactions on Evolutionary Computation, 2019, 23(3): 473-487. doi: 10.1109/TEVC.2018.2869405
    [17] 初蓓, 李占山. 基于森林优化调整选择算法的改进研究软件学报, 2018, 29(9): 2547-2558.
    [18] CHENG R, JIN Y. A social learning particle swarm optimization algorithm for scalable optimization[J]. Information Sciences, 2015, 291: 43-60. doi: 10.1016/j.ins.2014.08.039
    [19] FERNANDEZ-MARTINEZ J L, GARCIA-GONZALO E. Stochastic stability analysis of the linear continuous and discrete PSO models[J]. IEEE Transactions on Evolutionary Computation, 2011, 15(3): 405-423. doi: 10.1109/TEVC.2010.2053935
    [20] TRELEA I C. The particle swarm optimization algorithm: Convergence analysis and parameter selection[J]. Information Processing Letters, 2003, 85(6): 317-325. doi: 10.1016/S0020-0190(02)00447-7
    [21] Sastry. S, Nonlinear Systems: Analysis, Stability, and Control[M], New York: Springer, 1999.
    [22] YU X, LIU Y, FENG X, et al. Enhanced comprehensive learning particle swarm optimization with exemplar evolution[C]// Asia-Pacific Conference on Simulated Evolution and Learning. UK: Springer, 2017: 929-938.
  • 加载中
图(6) / 表(8)
计量
  • 文章访问数:  177
  • HTML全文浏览量:  189
  • PDF下载量:  1
  • 被引次数: 0
出版历程
  • 收稿日期:  2020-12-07
  • 网络出版日期:  2021-03-24

目录

    /

    返回文章
    返回