基于混合动作安全强化学习的汽油调合调度优化

石林林; 朱志伟; 赵云蒙

doi:10.14135/j.cnki.1006-3080.20260222001

基于混合动作安全强化学习的汽油调合调度优化

Gasoline Blending Scheduling Optimization Based on Hybrid-Action Safe Reinforcement Learning

摘要

摘要: 针对汽油调合调度中传统优化方法难以兼顾实时性、安全性与模型适配性的问题，提出了一种基于混合动作安全强化学习的调合调度优化方法，旨在实现随机扰动下的高效安全决策。该方法采用多通道前向传播（Multi-pass）结构消除混合动作评估中的虚假梯度，并结合比例-积分-微分（PID）调节的拉格朗日乘子机制处理质量与库存约束。仿真结果表明，所提方法在随机扰动下实现了产品质量与库存约束零违背，单步决策时间约5 ms。尽管平均收益较确定性最优解低2.5%，但该方法在复杂扰动环境下仍表现出优越的鲁棒性与实时应用潜力。

Abstract: A safe reinforcement learning method is proposed for gasoline blending scheduling under random disturbances in the properties and flow rates of direct-feed component oils. The scheduling task requires the coordinated optimization of discrete scheduling decisions and continuous blending recipes under strict product quality and inventory constraints. To solve this problem, the scheduling process is formulated in a hybrid action space, so that discrete product switching decisions and continuous recipe regulation can be handled in a unified framework. On this basis, a Multi-pass hybrid soft actor-critic algorithm (MPH-SAC) is developed for gasoline blending scheduling. The Multi-pass structure removes spurious gradients in hybrid action evaluation and improves the coordination between discrete scheduling decisions and continuous blending recipes. In addition, a proportional-integral-derivative (PID) regulated Lagrangian multiplier mechanism is introduced to dynamically enforce safety constraints while maintaining economic performance during learning. The proposed method is tested in a gasoline blending scheduling environment with uncertain direct-feed component oil conditions. Simulation results show that the proposed method achieves zero violations of product quality and inventory constraints under random disturbances. The average online decision time is about 5 ms, whereas the deterministic optimization model solved by Gurobi requires about 394 s to obtain a near-optimal solution. Although the average profit of the proposed method is about 2.5% lower than the deterministic optimum, it maintains strong robustness against disturbances in the properties and flow rates of direct-feed component oils and avoids the model mismatch caused by deterministic assumptions. These results demonstrate that the proposed method can effectively balance profitability, operational safety, and real-time decision requirements, and provides a practical solution for real-time gasoline blending scheduling in complex and uncertain operating environments.

HTML全文

参考文献(30)

施引文献

资源附件(0)