高级检索

    钱恒, 虞慧群, 范贵生. 基于增量式随机森林的燃气负荷预测方法[J]. 华东理工大学学报(自然科学版), 2019, 45(1): 133-139. DOI: 10.14135/j.cnki.1006-3080.20180111002
    引用本文: 钱恒, 虞慧群, 范贵生. 基于增量式随机森林的燃气负荷预测方法[J]. 华东理工大学学报(自然科学版), 2019, 45(1): 133-139. DOI: 10.14135/j.cnki.1006-3080.20180111002
    QIAN Heng, YU Huiqun, FAN Guisheng. A Gas Consumption Prediction Method Based on Incremental Random Forest Regression Algorithm[J]. Journal of East China University of Science and Technology, 2019, 45(1): 133-139. DOI: 10.14135/j.cnki.1006-3080.20180111002
    Citation: QIAN Heng, YU Huiqun, FAN Guisheng. A Gas Consumption Prediction Method Based on Incremental Random Forest Regression Algorithm[J]. Journal of East China University of Science and Technology, 2019, 45(1): 133-139. DOI: 10.14135/j.cnki.1006-3080.20180111002

    基于增量式随机森林的燃气负荷预测方法

    A Gas Consumption Prediction Method Based on Incremental Random Forest Regression Algorithm

    • 摘要: 随着智能燃气网概念的普及和燃气智能表的发展,燃气负荷数据量呈指数级增长,燃气负荷预测面临新的挑战,传统的基于离线批量学习的数据预测方法已无法满足大数据量的实时数据预测需求。针对燃气负荷数据增量到达的预测场景,提出了一种增量式的随机森林回归(Incremental Random Forest Regression,IRFR)算法。该算法将一定量的样本存储在叶节点,通过衡量样本集变异系数来控制叶节点分裂;针对大数据量的情况,设计了样本丢弃策略来控制内存空间。在对上海市燃气负荷数据进行特征选择、提取并建模后,将IRFR算法应用于燃气负荷预测。实验结果表明,IRFR算法相比于传统的随机森林算法具有相当的准确率,同时所需训练时间较短,更适用于大数据量的增量学习场景。

       

      Abstract: With the development of smart gas grid and the improvement of gas meters, the volume of gas consumption data increases exponentially, which also brings new challenges to the gas consumption prediction. Traditional prediction algorithms based on off-line batch learning can hardly meet the requirement of real-time prediction with big data. In order to solve the problem that gas consumption data arrives incrementally, this paper proposes an incremental random forest regression algorithm (IRFR). Firstly, a certain number of samples are stored in leaf nodes according to their values of features. Secondly, the coefficient of variation is taken as a threshold to control the split of leaf nodes so that it supports samples arriving incrementally. In addition, aiming at the case that samples may exceed the maximum sample storage, this paper proposes a sample discard strategy, whose procedure is to search the leaf node of storing the most samples and discard half of samples after sorting them. Thus, the proposed algorithm may better adapt the big data case. Finally, the simulation experiment is conducted on the prediction of gas consumption in Shanghai, which includes data preprocessing, feature selection, IRFR model training, model parameter adjustment. Especially, during feature selection, the weather, particularity of date, and history consumption data are chosen. Experiment results show that, compared with traditional random forest based on batch learning, the proposed IRFR can attain the same prediction accuracy and has less training time, especially, on big data. Consequently, IRFR is more suitable for incremental learning with big data than traditional random forest.

       

    /

    返回文章
    返回