Skip to main navigation Skip to search Skip to main content

Foundation models for autonomous driving: A comprehensive survey

  • École de technologie supérieure
  • Mediterranean Institute of Technology
  • King Abdulaziz University

Research output: Contribution to journalShort surveypeer-review

Abstract

Large Language Models (LLMs) have showcased remarkable proficiency in various information-processing tasks. They excel at data extraction, literature summarization, content generation, predictive modeling, decision-making, and system control. Moreover, Vision-Language Models (VLMs) and Multimodal LLMs (MLLMs), collectively referred to in this work as Cross-modal Language Models (XLMs), integrate multiple data modalities with language understanding, thereby advancing Autonomous Driving Systems (ADS). On the implemented Artificial Intelligence (AI) side, we analyze core techniques such as prompt engineering, supervised fine-tuning, reinforcement learning from human feedback, knowledge distillation, quantization and pruning, and safety alignment/verification, together with edge-aware deployment strategies. On the application of AI side, we map XLMs capabilities to the driving stack, including perception, prediction, planning, control, and human–machine interaction/vehicle-to-everything, and summarize how XLMs improve scene understanding, intent forecasting, decision-making, and closed-loop control by coupling natural-language reasoning with multimodal sensory inputs, such as panoramic images, Light Detection and Ranging (LiDAR), and radar. In this survey, we synthesize the state of XLMs for ADS: we review the relevant literature on ADS and XLMs, including their architectures, tools, and frameworks. We then compare deployment approaches across the driving stack and summarize datasets, simulators, and benchmarks for both open- and closed-loop evaluation. Finally, we analyze key challenges, such as grounding and hallucination, long-tail robustness, real-time and resource constraints, safety alignment and verification, and data governance and privacy, and outline research directions toward safe, efficient, and trustworthy XLM-enabled ADS.

Original languageEnglish
Article number114805
JournalEngineering Applications of Artificial Intelligence
Volume176
DOIs
Publication statusPublished - 15 Jul 2026

!!!Keywords

  • Application of artificial intelligence
  • Autonomous Driving Systems
  • Cross-modal Language Models
  • Datasets and simulators implemented artificial intelligence
  • Decision making and planning
  • Edge deployment for real-time inference
  • Foundation Models
  • Large Language Models
  • Multimodal large language models
  • Perception, prediction, planning, and control
  • Prompt engineering
  • Reinforcement Learning from Human Feedback
  • Safety alignment and verification
  • Vision Foundation Models
  • Vision-Language Models

Fingerprint

Dive into the research topics of 'Foundation models for autonomous driving: A comprehensive survey'. These topics are generated from the title and abstract of the publication. Together, they form a unique fingerprint.

Cite this