基于代码提交信息的方法级软件缺陷预测

董杭; 虞慧群

doi:10.14135/j.cnki.1006-3080.20250422002

基于代码提交信息的方法级软件缺陷预测

董杭,
虞慧群

Method-Level Bug Prediction Using Code Commit Information

摘要

摘要: 软件缺陷预测作为软件质量保障的重要手段，近年来成为了软件工程的热门研究方向。然而，现有预测技术仍面临两大挑战：一是缺陷预测的粗粒度难以满足工业界现实需求，二是模型对动态开发过程适应较差，过于依赖静态代码特征与历史数据而难以捕捉代码变更与提交信息。针对上述问题，本文提出基于多维度提交特征的方法级缺陷预测框架，旨在提升预测的准确性。方法核心创新在于：提出一组基于代码提交信息的新特征，并结合传统代码及历史特征构建更全面的多维度特征空间，以此构建的模型在17个开源项目上性能显著优于现有技术；通过SHAP特征重要性分析证实提交特征具有卓越的预测能力，增强模型的可解释性；基于识别关键特征进一步简化模型，兼顾了效率与精度。实验结果表明，融合提交信息的模型基于AUC、F1与MCC性能指标分别提升了4.3%、8.4%与17.7%。

Abstract: Software bug prediction is a vital aspect of software quality assurance and has become a key research area in software engineering. However, current prediction technologies face two main challenges: First, coarse-grained bug prediction often fails to meet the practical needs of industry. Second, existing models have limited adaptability to dynamic development processes and rely heavily on static code features and historical data, making it difficult to effectively capture code changes and commit information. To tackle these issues, this paper presents a method-level bug prediction framework that utilizes multi-dimensional commit features to improve prediction accuracy. The primary innovation lies in introducing a novel set of features derived from code commit information, which are combined with traditional code and historical features to create a more comprehensive feature space. This model significantly outperforms existing technologies across 17 open-source projects. SHAP-based feature importance analysis further confirms that the commit features possess strong predictive capabilities while enhancing model interpretability. By identifying key features, the model is streamlined without compromising efficiency or accuracy. Experimental results show that incorporating code commit information increases AUC value by an average of 4.3%, F1 score by 8.4%, and MCC value by 17.7%.

HTML全文

参考文献(12)

施引文献

资源附件(0)