Large Language Model guided Deep Reinforcement Learning for Decision Making in Autonomous Driving

Hao Pang1, Zhenpo Wang1, Guoqiang Li1,,
1 School of Mechanical Engineering, Beijing Institute of Technology
Corresponding author

Abstract

Deep reinforcement learning (DRL) shows promising potential for autonomous driving decision-making. However, DRL demands extensive computational resources to achieve a qualified policy in complex driving scenarios due to its low learning efficiency. Moreover, leveraging expert guidance from human to enhance DRL performance incurs prohibitively high labor costs, which limits its practical application. In this study, we propose a novel large language model (LLM) guided deep reinforcement learning (LGDRL) framework for addressing the decision-making problem of autonomous vehicles. Within this framework, an LLM-based driving expert is integrated into the DRL to provide intelligent guidance for the learning process of DRL. Subsequently, in order to efficiently utilize the guidance of the LLM expert to enhance the performance of DRL decision-making policies, the learning and interaction process of DRL is enhanced through an innovative expert policy constrained algorithm and a novel LLM-intervened interaction mechanism. Experimental results demonstrate that our method not only achieves superior driving performance with a 90% task success rate but also significantly improves the learning efficiency and expert guidance utilization efficiency compared to state-of-the-art baseline algorithms. Moreover, the proposed method enables the DRL agent to maintain consistent and reliable performance in the absence of LLM expert guidance.

LLM Guided DRL Paradigm

In the LLM Guided DRL Paradigm, the LLM expert, which is prompted by a prompt generator, provides guidance to enhance the learning process of the DRL agent.

Framework

Based on the LLM guided DRL paradigm introduced above, we propose the LLM Guided Deep Reinforcement Learning (LGDRL) framework. Within this framework, an LLM driving expert is prompted to guide the learning process of the DRL agent. A novel expert policy constrained DRL algorithm, which integrates a policy constraint based on Jensen-Shannon (JS) divergence into the learning objective, is used to facilitate the DRL agent to learn more effectively from the expert guidance. The actions applied to the environment are determined by a novel LLM-intervened interaction mechanism, which allows the LLM expert to intervene in the DRL agent actions when necessary.


Result

Driving performance comparison of different post-trained DRL agents

DRL agent trained with SAC+BC baseline method

DRL agent trained with SAC+RP baseline method

DRL agent trained with SAC+Demo baseline method

DRL agent trained with Vallina_SAC baseline method

DRL agent trained with our proposed LGDRL method

The driving scenario is constructed by the Highway-env. The DRL agent trained with our proposed LGDRL method successfully completes the driving task. However, the other DRL agents do not complete the driving task.

Policy Comparison of Post-trained LGDRL Agent and LLM Expert

Textual description of current frame:

Icon LLM expert's response in current frame:

Icon Post-trained LGDRL agent's decision in current frame:

Comparison of the decisions generated by the post-trained LGDRL agent and the LLM expert:

BibTeX

@misc{haolgdrl,
    title={Large Language Model Guided Deep Reinforcement Learning for Decision Making in Autonomous Driving}, 
    author={Hao Pang, Zhenpo Wang, and Guoqiang Li},
    year={2024},
    eprint={2412.18511},
    archivePrefix={arXiv},
    primaryClass={cs.RO}}