Collaborative Execution of Exploration and Tracking Using Move Value Estimation for Robot Teams (MVERT)
Abstract
This work presents Move Value Estimation for Robot Teams (MVERT), a behavior-based architecture for selecting low-level actions at the execution layer of a robot control architecture. MVERT is specifically designed for multi-robot teams. The design goal for MVERT is to produce reasonable performance that takes advantage of the heterogeneous multi-robot team while maintaining computational efficiency by approximating optimal performance. Here, actions are defined as selecting a pose to move to in the next time step. The MVERT action selection architecture represents progress toward mission goals with mathematical value functions that map current state and potential actions to a numerical value representing progress. Current state includes the current locations of teammates as well as objects in the environment. Using the current state and models of the teammates' capabilities, approximations of their contributions toward tasks in the next step can be made by applying the value functions. Given these predictions, each robot can select the action for which the team's overall progress will be maximized. Computation is made scalable by adjusting the number of candidate moves considered by each robot and the complexity of the models/value functions to be evaluated at each step. MVERT is fully distributed ?each agent makes its own movement decisions based on its knowledge. Other approaches (optimal trajectory planning, for example) in large state-spaces may be computationally prohibitive, particularly for replanning online during mission execution. However, taking advantage of a team's multi-agent nature to provide efficiency requires consideration of teammate contributions. MVERT can be used alone to guide a team or it may be integrated with a pre-defined plan (provided by an AI planner or human). In the MVERT architecture, each mission task is described by a value function which maps a robot's pose to utility in completing the task. In selecting an action, each robot approximates the next-step contributions of each teammate by applying the teammates' sensing models and current locations in the value functions. The robot then evaluates move actions by applying the value functions and the robot's own sensor models to candidate locations. Varying the number of candidate move actions can reduce the computation required as needed. The action that results in the overall highest-valued pose is selected and executed. Weighting multiple tasks allows prioritizing tasks in accordance with desired performance, and weights can be dynamically adapted (directly or autonomously) as the plan and environment changes. The potential applications of MVERT include any tasks that can be represented by some computable mathematical function. These functions need not be smooth, continuous or differentiable, as they are evaluated at the current state and not optimized. Values may depend on any aspects of the current state of the world, and will typically depend on robot poses, object locations, and robot capabilities and sensing models. These functions may also be time dependent, changing mission priorities as time elapses. The total value of a state is a weighted average of the individual values for each task. The weights assigned to each task determine the resulting behavior of the team by prioritizing some tasks relative to others. As the potential for improving value on some tasks reduce, the weights automatically shift focus to the other tasks. Weights may differ among robots to reflect capabilities or to diversify the team by providing subteams different priorities. MVERT has been applied in simulation and on teams of physical robots (Sony Aibos) within a behavior-based robot control architecture. Applications include mapping unknown target location within a known environment, target mapping within an unknown environment, dynamic target tracking within an environment with known landmarks, and more complex missions (requiring mapping, detailed action at specified locations, exploration, and maintaining line-of-sight for communications). When using MVERT, robots are able to greatly improve performance in terms of the quality of resulting maps and exploration efficiency when compared to individual action selection. MVERT can also be applied to missions with multiple tasks by weighting individual task values according to priority to produce a contextually appropriate action.
BibTeX
@phdthesis{Stroupe-2003-8738,author = {Ashley Stroupe},
title = {Collaborative Execution of Exploration and Tracking Using Move Value Estimation for Robot Teams (MVERT)},
year = {2003},
month = {September},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-03-07},
}