基于强化学习和角度惩罚距离的冰晶连续优化算法

许毅; 冯翔; 虞慧群

doi:10.14135/j.cnki.1006-3080.20191125003

基于强化学习和角度惩罚距离的冰晶连续优化算法

Ice Crystal Continuous Optimization Algorithm Based on Reinforcement Learning and Angle Penalty Distance

摘要

摘要: 针对全局连续优化问题，提出了一种基于强化学习的概率更新和角度惩罚距离偏差策略的冰晶连续优化算法。首先，通过模拟湖水结冰的自然现象，提出了冰晶连续优化算法，实现对连续极值问题的求解。在选择湖水中心时，加入的角度惩罚距离能更好地平衡收敛性和多样性，消除临时湖水中心带来的能量计算误差；然后，基于强化学习的概率更新可以对新生晶体的位置有更好的引导效果，加快湖水的结冰过程，更快地逼近湖水中心−全局最优点；最后，为了验证概率更新和角度惩罚距离的有效性，对加入概率更新策略前后的算法进行了比较。将本文算法与其他4种算法在12个基准函数上进行了比较，验证了算法的有效性。

Abstract: The global optimization problem has been widely used in various fields, but the traditional method mainly relies on the gradient information of the objective function. The meta heuristic search algorithm has better flexibility and can be used in practical problem. Hence, aiming at the global continuous optimization problem, this paper proposes a continuous crystal energy optimization algorithm based on reinforcement learning and angular penalty distance (APD-CEO), which introduces the probabilistic update strategy based on reinforcement learning and the deviation strategy based on angle penalty distance. Firstly, the ice crystal continuous optimization algorithm is proposed to solve the continuous extremum problem by simulating the freezing process of lake water. Secondly, in order to eliminate the error in calculating the energy from the temporary center of the lake, the Angle penalty distance strategy is introduced to better balance the convergence and diversity. Meanwhile, the probabilistic update strategy based on reinforcement learning can better guide the position of the newly formed crystals, accelerate the freezing process of the lake, and approach the center of lake faster (the global optimum). Finally, in order to verify the validity of probabilistic update strategy and angular penalty distance strategy, these algorithms with and without joining these strategies are compared. It is shown that APD-CEO has better performance than other algorithms in most benchmark functions, and the contrast effect is more obvious in the high dimensions. Moreover, Friedman test also shows that the APD-CEO ranks the best one among five algorithms.

HTML全文

参考文献(21)

施引文献

资源附件(0)