Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping

2402.14083

313

Published 4/30/2024 by Lucas Lehnert, Sainbayar Sukhbaatar, DiJia Su, Qinqing Zheng, Paul Mcvay, Michael Rabbat, Yuandong Tian

cs.AI

Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping

Abstract

While Transformers have enabled tremendous progress in various application settings, such architectures still trail behind traditional symbolic planners for solving complex decision making tasks. In this work, we demonstrate how to train Transformers to solve complex planning tasks. This is accomplished by training an encoder-decoder Transformer model to predict the search dynamics of the $A^

$ search algorithm. We fine tune this model to obtain a Searchformer, a Transformer model that optimally solves previously unseen Sokoban puzzles 93.7% of the time, while using up to 26.8% fewer search steps than the $A^

$ implementation that was used for training initially. In our training method, $A^*$'s search dynamics are expressed as a token sequence outlining when task states are added and removed into the search tree during symbolic planning. Searchformer significantly outperforms baselines that predict the optimal plan directly with a 5-10$times$ smaller model size and a 10$times$ smaller training dataset. Lastly, we demonstrate how Searchformer scales to larger and more complex decision making tasks with improved percentage of solved tasks and shortened search dynamics.

Get summaries of the top AI research delivered straight to your inbox:

Overview

• This paper proposes a novel method called "Search Dynamics Bootstrapping" (SDB) that uses transformer-based models to improve planning performance beyond traditional A* search algorithms. • The method leverages the representational power of transformers to learn effective search strategies from data, allowing for more efficient and accurate planning in complex environments. • The paper evaluates SDB on various planning problems, including stream-search, motion planning, and partially observable planning, demonstrating significant improvements over existing approaches.

Plain English Explanation

The paper introduces a new way to improve planning algorithms, which are used to find the best sequence of actions to achieve a goal. Traditionally, algorithms like A* search have been used, but they have limitations. The researchers propose using a special type of AI model called a transformer, which is good at learning patterns from data.

The key idea is to use transformers to learn effective search strategies from previous planning problems, rather than relying solely on the traditional A* algorithm. This allows the planning system to become more efficient and accurate, especially in complex environments where traditional approaches struggle.

The paper tests this "Search Dynamics Bootstrapping" (SDB) method on a variety of planning problems, including stream-search, motion planning, and partially observable planning. In all cases, SDB is shown to outperform traditional planning algorithms, demonstrating the power of using transformers to learn effective search strategies.

Technical Explanation

The paper presents a novel approach called "Search Dynamics Bootstrapping" (SDB) that leverages the representational power of transformer models to improve planning performance beyond traditional A* search algorithms. The key insight is that transformers can learn effective search strategies from data, allowing for more efficient and accurate planning in complex environments.

The SDB method works by training a transformer-based model to predict the search dynamics of a planning problem, such as the sequence of states and actions explored during the search process. This learned model is then used to guide the search, effectively "bootstrapping" the planning algorithm to achieve better results.

The paper evaluates SDB on a range of planning problems, including stream-search, motion planning, and partially observable planning. The results demonstrate that SDB significantly outperforms traditional A* search, as well as other state-of-the-art planning approaches, across a variety of metrics such as solution quality, planning time, and task completion rate.

Critical Analysis

The paper presents a compelling approach to improving planning algorithms by leveraging the power of transformer models. The key strength of SDB is its ability to learn effective search strategies from data, which can lead to significant performance gains compared to traditional methods.

However, the paper does acknowledge certain limitations and areas for further research. For example, the performance of SDB may be sensitive to the quality and diversity of the training data, and the method may not generalize well to completely novel planning problems. Additionally, the computational overhead of training and running the transformer-based model could be a concern in certain real-time applications.

Further research could explore ways to address these limitations, such as developing techniques for efficient transfer learning or incorporating the SDB approach into a hybrid planning system that combines the strengths of both traditional and learning-based methods. Additionally, it would be interesting to see how the decision transformer and other transformer-based reasoning approaches could be integrated with or complement the SDB framework.

Overall, the paper presents a promising direction for improving planning algorithms and highlights the potential of transformer-based models to tackle complex decision-making tasks.

Conclusion

The paper introduces a novel planning approach called "Search Dynamics Bootstrapping" (SDB) that leverages transformer-based models to learn effective search strategies and outperform traditional A* search algorithms. The method has been evaluated on a range of planning problems, including stream-search, motion planning, and partially observable planning, demonstrating significant improvements over existing approaches.

The key contribution of this work is the insight that transformer models can be used to learn and leverage the dynamics of the search process, leading to more efficient and accurate planning. This approach has the potential to advance the field of planning and decision-making, especially in complex and dynamic environments where traditional methods struggle.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Stream of Search (SoS): Learning to Search in Language

Kanishk Gandhi, Denise Lee, Gabriel Grand, Muxin Liu, Winson Cheng, Archit Sharma, Noah D. Goodman

Language models are rarely shown fruitful mistakes while training. They then struggle to look beyond the next token, suffering from a snowballing of errors and struggling to predict the consequence of their actions several steps ahead. In this paper, we show how language models can be taught to search by representing the process of search in language, as a flattened string -- a stream of search (SoS). We propose a unified language for search that captures an array of different symbolic search strategies. We demonstrate our approach using the simple yet difficult game of Countdown, where the goal is to combine input numbers with arithmetic operations to reach a target number. We pretrain a transformer-based language model from scratch on a dataset of streams of search generated by heuristic solvers. We find that SoS pretraining increases search accuracy by 25% over models trained to predict only the optimal search trajectory. We further finetune this model with two policy improvement methods: Advantage-Induced Policy Alignment (APA) and Self-Taught Reasoner (STaR). The finetuned SoS models solve 36% of previously unsolved problems, including problems that cannot be solved by any of the heuristic solvers. Our results indicate that language models can learn to solve problems via search, self-improve to flexibly use different search strategies, and potentially discover new ones.

4/8/2024

cs.LG cs.AI cs.CL

Transfer Learning Study of Motion Transformer-based Trajectory Predictions

Lars Ullrich, Alex McMaster, Knut Graichen

Trajectory planning in autonomous driving is highly dependent on predicting the emergent behavior of other road users. Learning-based methods are currently showing impressive results in simulation-based challenges, with transformer-based architectures technologically leading the way. Ultimately, however, predictions are needed in the real world. In addition to the shifts from simulation to the real world, many vehicle- and country-specific shifts, i.e. differences in sensor systems, fusion and perception algorithms as well as traffic rules and laws, are on the agenda. Since models that can cover all system setups and design domains at once are not yet foreseeable, model adaptation plays a central role. Therefore, a simulation-based study on transfer learning techniques is conducted on basis of a transformer-based model. Furthermore, the study aims to provide insights into possible trade-offs between computational time and performance to support effective transfers into the real world.

4/15/2024

cs.LG cs.RO

🔄

Transformer-Enhanced Motion Planner: Attention-Guided Sampling for State-Specific Decision Making

Lei Zhuang, Jingdong Zhao, Yuntao Li, Zichun Xu, Liangliang Zhao, Hong Liu

Sampling-based motion planning (SBMP) algorithms are renowned for their robust global search capabilities. However, the inherent randomness in their sampling mechanisms often result in inconsistent path quality and limited search efficiency. In response to these challenges, this work proposes a novel deep learning-based motion planning framework, named Transformer-Enhanced Motion Planner (TEMP), which synergizes an Environmental Information Semantic Encoder (EISE) with a Motion Planning Transformer (MPT). EISE converts environmental data into semantic environmental information (SEI), providing MPT with an enriched environmental comprehension. MPT leverages an attention mechanism to dynamically recalibrate its focus on SEI, task objectives, and historical planning data, refining the sampling node generation. To demonstrate the capabilities of TEMP, we train our model using a dataset comprised of planning results produced by the RRT*. EISE and MPT are collaboratively trained, enabling EISE to autonomously learn and extract patterns from environmental data, thereby forming semantic representations that MPT could more effectively interpret and utilize for motion planning. Subsequently, we conducted a systematic evaluation of TEMP's efficacy across diverse task dimensions, which demonstrates that TEMP achieves exceptional performance metrics and a heightened degree of generalizability compared to state-of-the-art SBMPs.

5/1/2024

cs.RO cs.AI

Transformer Based Planning in the Observation Space with Applications to Trick Taking Card Games

Douglas Rebstock, Christopher Solinas, Nathan R. Sturtevant, Michael Buro

Traditional search algorithms have issues when applied to games of imperfect information where the number of possible underlying states and trajectories are very large. This challenge is particularly evident in trick-taking card games. While state sampling techniques such as Perfect Information Monte Carlo (PIMC) search has shown success in these contexts, they still have major limitations. We present Generative Observation Monte Carlo Tree Search (GO-MCTS), which utilizes MCTS on observation sequences generated by a game specific model. This method performs the search within the observation space and advances the search using a model that depends solely on the agent's observations. Additionally, we demonstrate that transformers are well-suited as the generative model in this context, and we demonstrate a process for iteratively training the transformer via population-based self-play. The efficacy of GO-MCTS is demonstrated in various games of imperfect information, such as Hearts, Skat, and The Crew: The Quest for Planet Nine, with promising results.

4/23/2024

cs.AI cs.LG