高级检索

    基于风险感知与动作掩码的强化学习最优潮流方法

    Risk-Aware Action-Masked Reinforcement Learning for Optimal Power Flow

    • 摘要: 最优潮流(Optimal Power Flow, OPF)是电力系统经济安全运行的重要问题。随着新能源出力不确定性和负荷波动性增强,OPF求解面临更复杂的运行约束和不确定性影响。为提升复杂运行场景下的安全探索能力和约束满足水平,提出一种基于风险感知与动作掩码的强化学习OPF方法。该方法将求解过程建模为受约束马尔可夫决策过程,并在奖励约束策略优化(Reward Constrained Policy Optimization, RCPO)框架中引入风险评价网络和动作掩码机制。风险评价网络根据系统状态和控制动作预测约束风险,为策略更新提供安全引导。动作掩码机制基于潮流灵敏度训练掩码网络,生成控制维度掩码并调节高斯策略的探索方差,从而减少高风险动作扰动。实验结果表明,所提方法能够在保持较高可行率的同时降低成本偏差和约束违反程度。消融实验验证了风险评价网络和动作掩码机制的有效性,说明该方法能够兼顾OPF求解的可行性、经济性和训练稳定性。

       

      Abstract: Optimal Power Flow (OPF) is an important problem for the economic and secure operation of power systems. With increasing renewable energy uncertainty and load fluctuations, OPF faces more complex operating constraints and uncertainty. To improve safe exploration and constraint satisfaction under complex operating conditions, this paper proposes a risk-aware and action-masked reinforcement learning method for OPF. The proposed method formulates the OPF solution process as a constrained Markov decision process and introduces a risk critic network and an action masking mechanism into the Reward Constrained Policy Optimization (RCPO) framework. The risk critic network predicts constraint risks based on the system state and control action, thereby providing safety guidance for policy updates. The action masking mechanism trains a mask network based on power flow sensitivity, generates masks for different control dimensions, and modulates the exploration variance of the Gaussian policy, thereby reducing high-risk action perturbations. Experimental results show that the proposed method reduces cost deviation and constraint violations while maintaining a high feasibility rate. Ablation studies further verify the effectiveness of the risk critic network and the action masking mechanism, indicating that the proposed method can better balance feasibility, economic performance, and training stability in solving OPF.

       

    /

    返回文章
    返回