Abstract:
With the development of smart gas grid and the improvement of gas meters, the volume of gas consumption data increases exponentially, which also brings new challenges to the gas consumption prediction. Traditional prediction algorithms based on off-line batch learning can hardly meet the requirement of real-time prediction with big data. In order to solve the problem that gas consumption data arrives incrementally, this paper proposes an incremental random forest regression algorithm (IRFR). Firstly, a certain number of samples are stored in leaf nodes according to their values of features. Secondly, the coefficient of variation is taken as a threshold to control the split of leaf nodes so that it supports samples arriving incrementally. In addition, aiming at the case that samples may exceed the maximum sample storage, this paper proposes a sample discard strategy, whose procedure is to search the leaf node of storing the most samples and discard half of samples after sorting them. Thus, the proposed algorithm may better adapt the big data case. Finally, the simulation experiment is conducted on the prediction of gas consumption in Shanghai, which includes data preprocessing, feature selection, IRFR model training, model parameter adjustment. Especially, during feature selection, the weather, particularity of date, and history consumption data are chosen. Experiment results show that, compared with traditional random forest based on batch learning, the proposed IRFR can attain the same prediction accuracy and has less training time, especially, on big data. Consequently, IRFR is more suitable for incremental learning with big data than traditional random forest.