Risk-Aware Action-Masked Reinforcement Learning for Optimal Power Flow
-
-
Abstract
Optimal Power Flow (OPF) is an important problem for the economic and secure operation of power systems. With increasing renewable energy uncertainty and load fluctuations, OPF faces more complex operating constraints and uncertainty. To improve safe exploration and constraint satisfaction under complex operating conditions, this paper proposes a risk-aware and action-masked reinforcement learning method for OPF. The proposed method formulates the OPF solution process as a constrained Markov decision process and introduces a risk critic network and an action masking mechanism into the Reward Constrained Policy Optimization (RCPO) framework. The risk critic network predicts constraint risks based on the system state and control action, thereby providing safety guidance for policy updates. The action masking mechanism trains a mask network based on power flow sensitivity, generates masks for different control dimensions, and modulates the exploration variance of the Gaussian policy, thereby reducing high-risk action perturbations. Experimental results show that the proposed method reduces cost deviation and constraint violations while maintaining a high feasibility rate. Ablation studies further verify the effectiveness of the risk critic network and the action masking mechanism, indicating that the proposed method can better balance feasibility, economic performance, and training stability in solving OPF.
-
-