Robot Icon From Mystery to Mastery: Failure
Diagnosis for Improving Manipulation
Policies



Som Sagar1, Jiafei Duan2, Sreevishakh Vasudevan1, Yifan Zhou1, Heni Ben Amor1, Dieter Fox2,3, and Ransalu Senanayake1
1Arizona State University, 2University of Washington, 3NVIDIA



Abstract

Robot manipulation policies often fail for unknown reasons, posing significant challenges for real-world deployment. Researchers and engineers typically address these failures using heuristic approaches, which are not only labor-intensive and costly but also prone to overlooking critical failure modes (FMs). This paper introduces Robot Manipulation Diagnosis (RoboMD), a systematic framework designed to automatically identify FMs arising from unexpected changes in the environment. To navigate the vast space of potential FMs for a given pre-trained manipulation policy, we leverage deep reinforcement learning (deep RL) to explore and uncover these FMs using a specially trained vision-language embedding that encodes a notion of failures. This approach enables users to probabilistically quantify and rank failures in previously unseen environmental conditions. Through extensive experiments across various manipulation tasks and algorithms, we demonstrate RoboMD's effectiveness in diagnosing unknown failures in unstructured environments, providing a systematic pathway to enhance the robustness of manipulation policies.

Real World Variations

Summary

Observe Failures, Uncover Failures, Adapt

Robot Manipulation Diagnosis (RoboMD), a systematic framework designed to automatically identify FMs arising from unanticipated changes in the environment. Considering the vast space of potential FMs in a pre-trained manipulation policy, we leverage deep reinforcement learning (deep RL) to explore and uncover these FMs using a specially trained vision-language embedding that encodes a notion of failures. This approach enables users to probabilistically quantify and rank failures in previously unseen environmental conditions. Through extensive experiments across various manipulation tasks and algorithms, we demonstrate RoboMD's effectiveness in diagnosing unknown failures in unstructured environments, providing a systematic pathway to improve the robustness of manipulation policies.

Experiments and Results

Radar plots for multiple models

Individual FM analysis of multiple models. Each radar plot represents the failure likelihood of a specific actions. The axes correspond to different environmental setups (e.g., Red Cube, Green Table, Blue Table) (a) for real-world setup and (b,c) for simulation, and the numbers indicate the probability of failure for actions under each configuration.



Table

Comparison of rankings for failure-inducing actions in continuous and discrete action spaces. ar represent actions performed in the real robot environment ar 1 = “Bread” (Unseen), ar 2 = “Red Cube”, ar 3 = “Milk Carton”, ar 4 = “Sprite”. as represent the actions performed in simluated environment as1 = “Red Table”, as2 = “Black Table” (Unseen), as3 = “Green Lighting.” Rank consistency indicates whether the rankings are preserved across the two formulations. The accuracy is computed over 21 environment variations.

Simulation Variations