Publication | ICRA Workshop on RL for Contact-Rich Manipulation 2022

Learning Dense Reward with Temporal Variant Self-Supervision

Reinforcement learning (RL) is gaining momentum in solving complex real-world robotics problems. One challenging category is contact-rich manipulation tasks. The success of RL in these scenarios depends on a reliable reward system. While this genre of problems is marked by rich, high dimensional, continuous observations, it is typically hard to come up with a dense reward that can harness such richness to guide RL training. The conventional way of using sparse, boolean rewards (e.g., 1 if the task is successfully completed and 0 otherwise) is often challenging and inefficient. The difficulty has led to the practice of reward engineering, where rewards are hand-crafted based on domain knowledge and trial-and-error. However, such solutions often require extensive robotics expertise and can be task-specific.

In this research, we propose an end-to-end learning framework that can extract dense rewards from multimodal observations. The reward is learned in a self-supervised manner by combining one or two human demonstrations with a physics simulator, and can then be directly used in training RL algorithms. We evaluate our framework in two contact-rich manipulation tasks, joint assembly and door-opening tasks.

There are two main contributions in this paper: 1) a temporal variant forward sampling (TVFS) method that is more robust and cost-efficient in generating samples from human demonstrations for contact-rich manipulation tasks, 2) a self-supervised latent representation learning architecture that can utilize sample pairs from TVFS.

Download publication

Abstract

Learning Dense Reward with Temporal Variant Self-Supervision

Yuning Wu, Jieliang Luo, Hui Li

ICRA Workshop on RL for Contact-Rich Manipulation 2022

Rewards play an essential role in reinforcement learning. In contrast to rule-based game environments with well-defined reward functions, complex real-world robotic applications, such as contact-rich manipulation, lack explicit and informative descriptions that can directly be used as a reward. Previous effort has shown that it is possible to algorithmically extract dense rewards directly from multimodal observations. In this paper, we aim to extend this effort by proposing a more efficient and robust way of sampling and learning. In particular, our sampling approach utilizes temporal variance to simulate the fluctuating state and action distribution of a manipulation task. We then proposed a network architecture for self-supervised learning to better incorporate temporal information in latent representations. We tested our approach in two experimental setups, namely joint-assembly and door opening. Preliminary results show that our approach is effective and efficient in learning dense rewards, and the learned rewards lead to faster convergence than baselines.

Associated Researchers

Jieliang (Rodger) Luo

Sr. Principal AI Research Scientist

Hui Li

Senior Principal Research Scientist

Yuning Wu

Carnegie Mellon University

View all researchers

Related Resources

Article

2025

Recently Published by Autodesk Researchers

A selection of papers published recently by Autodesk Researchers…

Publication

2023

Generating Pragmatic Examples to Train Neural Program Synthesizers

Using neural networks is a novel way to amortize a synthesizer’s…

Article

2023

Accelerating Scientific Computing with JAX-LBM

Exploring the fusion of JAX and LBM for ground-breaking research in…

Publication

2021

Robust Representation Learning via Perceptual Similarity Metrics

A fundamental challenge in artificial intelligence is learning useful…

Get in touch

Something pique your interest? Get in touch if you’d like to learn more about Autodesk Research, our projects, people, and potential collaboration opportunities.

Cookie preferences

Your privacy is important to us and so is an optimal experience. To help us customize information and build applications, we collect data about your use of this site.

May we collect and use your data?

Learn more about the Third Party Services we use and our Privacy Statement.

Are you sure you want a less customized experience?

We can access your data only if you select "yes" for the categories on the previous screen. This lets us tailor our marketing so that it's more relevant for you. You can change your settings at any time by visiting our privacy statement

Your experience. Your choice.

We care about your privacy. The data we collect helps us understand how you use our products, what information you might be interested in, and what we can improve to make your engagement with Autodesk more rewarding.

May we collect and use your data to tailor your experience?

Explore the benefits of a customized experience by managing your privacy settings for this site or visit our Privacy Statement to learn more about your options.

Learning Dense Reward with Temporal Variant Self-Supervision

Related Resources

Cookie preferences

Strictly necessary – required for our site to work and to provide services to you

Improve your experience – allows us to show you what is relevant to you

Customize your advertising – permits us to offer targeted advertising to you

Strictly necessary – required for our site to work and to provide services to you

Improve your experience – allows us to show you what is relevant to you

Customize your advertising – permits us to offer targeted advertising to you

Are you sure you want a less customized experience?

Your experience. Your choice.

Learning Dense Reward with Temporal Variant Self-Supervision

Related Resources

Cookie preferences

Strictly necessary – required for our site to work and to provide services to you

Improve your experience – allows us to show you what is relevant to you

Customize your advertising – permits us to offer targeted advertising to you

THIRD PARTY SERVICES

Strictly necessary – required for our site to work and to provide services to you

Improve your experience – allows us to show you what is relevant to you

Customize your advertising – permits us to offer targeted advertising to you

Are you sure you want a less customized experience?

Your experience. Your choice.