Abstract:
When using deep reinforcement learning methods to optimize the control of batch processes, there are problems such as low sample learning efficiency, long training time and unstable training. To solve these problems, a Double Actors and Regularized Double Critics (DARDC) algorithm is proposed. The algorithm enhances the agent's exploration ability in the complex state space through the parallel double actor network, effectively suppresses the value function estimation deviation by virtue of the regularization term constraint double critic network, and introduces the prioritized experience replay mechanism to accelerate the training convergence. Further, in order to ensure the rationality of control quantities output by the agent, expert experience rules are embedded into the action output. In the experiment of penicillin fermentation process, the performance of the proposed algorithm is better than the traditional deep reinforcement learning algorithm, and compared with PID control, the average penicillin yield of the key control index is increased by 19.1%. The results show that the DARDC algorithm has certain effectiveness and robustness, and provides a new method for the optimal control of batch process.