Abstract:
When using deep reinforcement learning method to optimize the control of batch processes, there are problems such as low sample learning efficiency, long training time and unstable training. To solve these problems, a double actors and regularized double critics (DARDC) algorithm is proposed. The algorithm enhances the agent's exploration ability in the complex state space through the parallel double actor network, effectively suppresses the value function estimation deviation with the help of the regularization term constraint double critic network, and introduces the prioritized experience replay mechanism to accelerate the training convergence. Further, in order to ensure the control quantity of the agent output is reasonable, expert experience rules are embedded in the action output. In the experiment of penicillin fermentation process, the performance of the proposed algorithm is better than the traditional deep reinforcement learning algorithm, and compared with PID control, the average penicillin yield of the key control index is increased by 19.1%. The results show that DARDC algorithm has certain effectiveness and robustness, and provides a new method for the optimal control of batch process.