Efficient Quadruped Mobility: Harnessing a Generalist Policy for Streamlined Planning

Master's Thesis, Tech. Report, CMU-RI-TR-24-72, December, 2024

View Publication

Abstract

Navigating quadruped robots through complex, unstructured environments over long horizons remains a critical challenge in robotics. Traditional planning methods excel in providing guarantees such as optimality and long-horizon reasoning, while learning-based approaches, particularly those leveraging Deep Reinforcement Learning (DRL), offer robustness and adaptability. This thesis introduces S3D-OWNS (Skilled 3D-Optimal Waypoint Navigation System), a novel hybrid framework that combines the strengths of both paradigms to achieve efficient and adaptive quadrupedal locomotion.

The S3D-OWNS framework integrates a high-level sampling-based planner with a generalist DRL-trained locomotion policy. The planner handles long-horizon navigation and optimal path planning by reasoning over obstacle traversability, while the learned policy executes agile, real-time locomotion tasks such as walking, jumping, and climbing. By leveraging DRL-trained policies, the dimensionality of the planning state space is reduced, enabling computational efficiency and allowing the locomotion policy to manage complex maneuvers. This integration empowers the system to navigate cluttered environments while optimizing for energy consumption, time efficiency, and task success.

Key innovations of this work include:
• A goal-conditioned locomotion policy trained across diverse terrains using DRL, ensuring robustness and adaptability.
• A sampling-based planner that evaluates obstacle traversability based on the robot’s capabilities, enabling efficient path planning beyond traditional obstacle avoidance.
• Cost predictors trained using GPU parallelization to estimate energy expenditure, traversal time, and success likelihood for each path segment.
• A modular design that simplifies heading control by implicitly aligning the robot’s orientation through waypoint placement.

Extensive experimentation in simulated environments with a Unitree Go1 quadruped demonstrates that S3D-OWNS significantly outperforms traditional collision-avoidance planners in navigation efficiency. The system optimally utilizes the robot’s climbing and jumping skills to reduce energy consumption or traversal time across challenging terrains. Ablation studies validate the contributions of individual components, highlighting improvements in operational efficiency and task success rates.

This research advances quadrupedal robotics by showcasing how hybrid systems can combine classical planning with modern AI-driven techniques to achieve both optimality and adaptability. The scalability of S3D-OWNS across different robot models and sensor configurations makes it applicable to diverse domains such as industrial automation, search-and-rescue missions, and exploration in unstructured environments. By addressing key challenges in long-horizon navigation and dynamic terrain adaptation, this work sets a foundation for more efficient and versatile robotic systems.

BibTeX

@mastersthesis{Mondal-2024-144875,
author = {Sayan Mondal},
title = {Efficient Quadruped Mobility: Harnessing a Generalist Policy for Streamlined Planning},
year = {2024},
month = {December},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-24-72},
keywords = {Deep Reinforcement Learning, Planning},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.