用于移动机器人路径规划的改进强化学习算法

中国民航大学学报 ›› 2024, Vol. 42 ›› Issue (5): 59-65.

用于移动机器人路径规划的改进强化学习算法

张威 1a，2，3 ，初泽源 1b ，杨玉涛 1a ，王伟 1a

1. 中国民航大学 a. 航空工程学院；b. 安全科学与工程学院，天津 300300；2. 中国民航航空地面特种设备研究基地，

天津

300300；3. 民航智慧机场理论与系统重点实验室，广州

510470

收稿日期:2023-01-07 修回日期:2023-05-04 出版日期:2024-12-21 发布日期:2025-04-08
作者简介:张威（1979— ），男，湖南衡阳人，教授，博士，研究方向为民航智能装备、机器人学等
基金资助:
国家自然科学基金民航联合研究基金重点项目（U2033208）；天津市研究生科研创新项目（2021YJSS122）

Improved reinforcement learning algorithm for mobile robot path planning

ZHANG Wei 1a,2,3 , CHU Zeyuan1b , YANG Yutao1a , WANG Wei 1a

1a. College of Aeronautical Engineering; 1b. College of Safety Science and Engineering, CAUC, Tianjin 300300, China;

2. Aviation Special Ground Equipment Research Base, CAAC，Tianjin 300300, China;

3. Key Laboratory of Smart Airport Theory and System, CAAC, Guangzhou 510470, China

Received:2023-01-07 Revised:2023-05-04 Online:2024-12-21 Published:2025-04-08

摘要/Abstract

摘要：

针对传统 Q-learning 算法规划出的路径存在平滑度差、收敛速度慢以及学习效率低的问题，本文提出一种用

于移动机器人路径规划的改进 Q-learning 算法。首先，考虑障碍物密度及起始点相对位置来选择动作集，以

加快 Q-learning 算法的收敛速度；其次，为奖励函数加入一个连续的启发因子，启发因子由当前点与终点的

距离和当前点距地图中所有障碍物以及地图边界的距离组成；最后，在 Q 值表的初始化进程中引入尺度因子，

给移动机器人提供先验环境信息，并在栅格地图中对所提出的改进 Q-learning 算法进行仿真验证。仿真结果

表明，改进 Q-learning 算法相比传统 Q-learning 算法收敛速度有明显提高，在复杂环境中的适应性更好，验

证了改进算法的优越性。

关键词:

强化学习, 路径规划, 启发式奖励函数, Q 值初始化

Abstract:

Aiming at the problems of poor smoothness, slow convergence speed and low learning efficiency of the paths

planned by the traditional Q-learning algorithm, this paper proposes an improved Q-learning algorithm for mobile

robot path planning. Firstly, the density of obstacles and the relative position of the start point are considered to select the action set to accelerate the convergence speed of the Q-learning algorithm. Secondly, a continuous heuristic factor is added to the reward function, which consists of the distance between the current point and the end

point, and the distance of the current point from all the obstacles in the map as well as the boundary of the map. Finally, a scale factor is introduced into the initialization process of Q-value table to give the mobile robot with a priori environment information, and the proposed improved Q-learning algorithm is simulated and verified in a raster

map. The simulation results show that the convergence speed of the improved Q-learning algorithm is significantly

improved compared with the traditional Q-learning algorithm, and its adaptability in complex environments is better, which verifies the superiority of the improved algorithm.

Key words:

reinforcement learning, path planning, heuristic reward function, Q-value initialization

中图分类号:

TP249

张威 a, , 初泽源 b , 杨玉涛 a , 王伟 a.

用于移动机器人路径规划的改进强化学习算法

[J]. 中国民航大学学报, 2024, 42(5): 59-65.

ZHANG Wei a, , CHU Zeyuanb , YANG Yutaoa , WANG Wei a.

Improved reinforcement learning algorithm for mobile robot path planning

[J]. Journal of Civil Aviation University of China, 2024, 42(5): 59-65.

参考文献

[1] 朱大奇, 颜明重. 移动机器人路径规划技术综述[J]. 控制与决策, 2010,

25(7): 961-967.

[2]

SARIFF N, BUNIYAMIN N． An overview of autonomous mobile robot

path planning algorithms[C]//2006 4th Student Conference on Research

and Development, June 27 -28, 2006, Shah Alam, Malaysia． IEEE,

2006: 183-188．

[3]

ELSHAMLI A, ABDULLAH H A, AREIBI S． Genetic algorithm for dy－

namic path planning[C]//Canadian Conference on Electrical and Com－

puter Engineering 2004, May 2-5, 2004, Niagara Falls, ON, Canada．

IEEE, 2004: 677-680．

[4] 段俊花, 李孝安. 基于改进遗传算法的机器人路径规划[J]. 微电子

学与计算机, 2005(1):70-72, 76.

[5] 朱毅, 张涛, 宋靖雁．非完整移动机器人的人工势场法路径规

划[J]．控制理论与应用, 2010, 27(2): 152-158．

[6]

BOUNINI F, GINGRAS D, POLLART H, et al． Modified artificial potential field method for online path planning applications[C]//2017 IEEE

Intelligent Vehicles Symposium(IV), June 11-14, 2017, Los Angeles,

CA, USA． IEEE, 2017: 180-185．

[7] 李擎, 张超, 韩彩卫, 等．动态环境下基于模糊逻辑算法的移动

机器人路径规划[J]．中南大学学报(自然科学版), 2013, 44(S2): 104-

108．

[8]

BAI Y F, DING X F, HU D S, et al． Research on dynamic path planning

of multi-AGVs based on reinforcement learning [J]． Applied Sciences,

2022, 12(16): 8166．

[9]

CAMPBELL J S, GIVIGI S N, SCHWARTZ H M． Multiple model Qlearning for stochastic asynchronous rewards[J]． Journal of Intelligent &

Robotic Systems, 2016, 81(3): 407-422．

[10] 王珂, 卜祥津, 李瑞峰, 等．景深约束下的深度强化学习机器人路

径规划[J]．华中科技大学学报(自然科学版), 2018, 46(12): 77-82．

[11] 刘朝阳, 穆朝絮, 孙长银. 深度强化学习算法与应用研究现状综述

[J]. 智能科学与技术学报, 2020, 2(4): 314-326.

[12] 刘建伟, 高峰, 罗雄麟. 基于值函数和策略梯度的深度强化学习综

述[J]. 计算机学报, 2019, 42(6): 1406-1438.

[13] 王子强, 武继刚. 基于 RDC-Q 学习算法的移动机器人路径规划[J].

计算机工程, 2014, 40(6): 211-214.

[14] RAJA P. Optimal path planning of mobile robots: a review[J]. International Journal of Physical Sciences, 2012, 7(9): 1314-1320.

[15] LOW E S, ONG P, CHEAH K C． Solving the optimal path planning of a

mobile robot using improved Q-learning[J]． Robotics and Autonomous

Systems, 2019, 115: 143-161．

[16] 宋勇, 李贻斌, 李彩虹．移动机器人路径规划强化学习的初始化

[J]．控制理论与应用, 2012, 29(12): 1623-1628．

[17] 张福海, 李宁, 袁儒鹏, 等．基于强化学习的机器人路径规划算法

[J]．华中科技大学学报(自然科学版), 2018, 46(12): 65-70．

[18] 徐晓苏, 袁杰．基于改进强化学习的移动机器人路径规划方法[J]．

中国惯性技术学报, 2019, 27(3): 314-320．

[19] 尹旷, 王红斌, 方健, 等．基于强化学习的移动机器人路径规划

优化[J]．电子测量技术, 2021, 44(10): 91-95．