基于双行动者双评论家算法的间歇过程优化控制

唐泽楷; 罗娜

doi:10.14135/j.cnki.1006-3080.20251110004

基于双行动者双评论家算法的间歇过程优化控制

唐泽楷,
罗娜

Optimal Control of Batch Process Based on Improved Deep Reinforcement Learning

TANG Zekai,
LUO Na

摘要

摘要: 在使用深度强化学习(Deep Reinforcement Learning, DRL)方法对间歇过程进行优化控制时，存在样本学习效率低、训练时间长和训练不稳定的问题。针对这些问题，提出了双行动者正则化双评论家(Double Actors And Regularized Double Critics, DARDC)算法。算法通过并行双行动者网络增强智能体在复杂状态空间的探索能力，借助正则化项约束双评论家网络，有效抑制值函数估计偏差，同时引入优先经验回放机制(Prioritized Experience Replay, PER)加速训练收敛。为保障智能体输出的控制量合理，在动作输出中嵌入专家经验规则。在青霉素发酵过程的实验中，所提算法的性能优于传统深度强化学习算法，且与PID控制相比，关键控制指标平均青霉素产量提升19.1%。研究结果表明，DARDC算法具备一定的有效性与鲁棒性，为间歇过程优化控制提供了新方法。

Abstract: When using deep reinforcement learning method to optimize the control of batch processes, there are problems such as low sample learning efficiency, long training time and unstable training. To solve these problems, a double actors and regularized double critics (DARDC) algorithm is proposed. The algorithm enhances the agent's exploration ability in the complex state space through the parallel double actor network, effectively suppresses the value function estimation deviation with the help of the regularization term constraint double critic network, and introduces the prioritized experience replay mechanism to accelerate the training convergence. Further, in order to ensure the control quantity of the agent output is reasonable, expert experience rules are embedded in the action output. In the experiment of penicillin fermentation process, the performance of the proposed algorithm is better than the traditional deep reinforcement learning algorithm, and compared with PID control, the average penicillin yield of the key control index is increased by 19.1%. The results show that DARDC algorithm has certain effectiveness and robustness, and provides a new method for the optimal control of batch process.

HTML全文

参考文献(29)

施引文献

资源附件(0)