Learn2Plan: Learning variable ordering heuristics for scalable planning
Abstract
Historically, the combinatorics of various search-based approaches to planning, even in single-agent planning contexts, have presented a solid barrier to scalable performance. These approaches quickly become complicated and overwhelming as the number of goals to be achieved (or tasks to be included in the plan) increases. It is because the number of tasks to be included adds more nodes in the search tree and, with more constraints, increases the constraint satisfaction times. However, we believe there is a way forward. We propose that the combinatorics of a planning problem can be overcome by learning an abstract model of the planner's search that utilizes the characteristics of the current state to learn the relative quality of various search decisions extending the plan. A substantial computational speedup can be achieved by learning on example runs of the planner in a given domain, and the learned model can be used as a surrogate for the explicit combinatorial search. We focus specifically on a framework for multi-agent planning and scheduling cite{parimi2022t} recently developed to support the continual management of a team of humans, robotic agents, and autonomous control systems. Within this framework, a new task request triggers a search for the best HTN decomposition of the request into constituent actions to be added to the plan, the best assignment of resources to those actions, and the best time slots over the planning horizon. The search initialization generates and evaluates all feasible options concerning a specified objective and selects the best option. The computational cost of performing the search increases proportionally with the number of pre-existing actions in the plan, the number of agents (resources) available to carry out these actions, and the number of alternative decompositions associated with a high-level task request.
This thesis proposes two solutions. First, it presents an efficient learning framework to aid the planner in accelerating the time-consuming search process by learning an abstract representation of the search space. This learning framework consists of an LSTM memory network with an MLP baseline that learns from example runs of the planner. Second, it describes a simulation pipeline for enhancing the real-world physical understanding of the reasoning used by the planner. Both of these solutions form a part of a robust, efficient, and interpretable planning system that can also be used in varied domains outside of a space habitat.
BibTeX
@mastersthesis{Misra-2023-135906,author = {Ashwin Misra},
title = {Learn2Plan: Learning variable ordering heuristics for scalable planning},
year = {2023},
month = {May},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-23-14},
keywords = {Search-based planning, Task planning, Learning, LSTM, Multi-Agent planning, Autonomous control, Deep Space Habitats},
}