Abstract:
A safe reinforcement learning method is proposed for gasoline blending scheduling under random disturbances in the properties and flow rates of direct-feed component oils. The scheduling task requires the coordinated optimization of discrete scheduling decisions and continuous blending recipes under strict product quality and inventory constraints. To solve this problem, the scheduling process is formulated in a hybrid action space, so that discrete product switching decisions and continuous recipe regulation can be handled in a unified framework. On this basis, a Multi-pass hybrid soft actor-critic algorithm (MPH-SAC) is developed for gasoline blending scheduling. The Multi-pass structure removes spurious gradients in hybrid action evaluation and improves the coordination between discrete scheduling decisions and continuous blending recipes. In addition, a proportional-integral-derivative (PID) regulated Lagrangian multiplier mechanism is introduced to dynamically enforce safety constraints while maintaining economic performance during learning. The proposed method is tested in a gasoline blending scheduling environment with uncertain direct-feed component oil conditions. Simulation results show that the proposed method achieves zero violations of product quality and inventory constraints under random disturbances. The average online decision time is about 5 ms, whereas the deterministic optimization model solved by Gurobi requires about 394 s to obtain a near-optimal solution. Although the average profit of the proposed method is about 2.5% lower than the deterministic optimum, it maintains strong robustness against disturbances in the properties and flow rates of direct-feed component oils and avoids the model mismatch caused by deterministic assumptions. These results demonstrate that the proposed method can effectively balance profitability, operational safety, and real-time decision requirements, and provides a practical solution for real-time gasoline blending scheduling in complex and uncertain operating environments.