Publication | Conference on Neural Information Processing Systems 2021

Program Synthesis Guided Reinforcement Learning

(a) The Ant-craft environment. The policy needs to control the ant to perform the crafting tasks. (b) The box-world environment. The grey pixel denotes the agent. The goal is to get the white key. The unobserved parts of the map is marked with “x”. The key currently held by the agent is shown in the top-left corner. In this map, the number of boxes in the path to the goal is 4, and it contains 1 distractor branch.

This Autodesk Research paper focuses on robotics in manufacturing. Users want to control these robots not by programming “how” to move but rather using high-level commands on “what” needs to be done. This paper outlines a way to automatically translate high level goal specifications to low level actuations

Download publication

Abstract

Program Synthesis Guided Reinforcement Learning

Yichen Yang, Jeevana Priya Inala, Osbert Bastani, Yewen Pu, Armando Solar-Lezama, Martin Rinard

Conference on Neural Information Processing Systems 2021

A key challenge for reinforcement learning is solving long-horizon planning and control problems. Recent work has proposed leveraging programs to help guide the learning algorithm in these settings. However, these approaches impose a high manual burden on the user since they must provide a guiding program for every new task they seek to achieve. We propose an approach that leverages program synthesis to automatically generate the guiding program. A key challenge is how to handle partially observable environments. We propose model predictive program synthesis, which trains a generative model to predict the unobserved portions of the world, and then synthesizes a program based on samples from this model in a way that is robust to its uncertainty. We evaluate our approach on a set of challenging benchmarks, including a 2D Minecraft-inspired “craft” environment where the agent must perform a complex sequence of subtasks to achieve its goal, a box-world environment that requires abstract reasoning, and a variant of the craft environment where the agent is a MuJoCo Ant. Our approach significantly outperforms several baselines, and performs essentially as well as an oracle that is given an effective program.

Associated Researchers

Yewen Pu

Former Autodesk

Yichen Yang

MIT

Jeevana Priya Inala

MIT

Armando Solar-Lezama

MIT

Martin Rinard

MIT

Osbert Bastani

University of Pennsylvania

View all researchers

Related Resources

Publication

2024

Optimal Design of Vehicle Dynamics Using Gradient-Based, Mixed-Fidelity Multidisciplinary Optimization

This research showcases a multidisciplinary approach to optimize a…

Article

2024

Autodesk Research’s Mary Elizabeth Yarbrough Joins Premier Industry Podcast

Listen to our very own Mary Elizabeth Yarbrough talk about the…

Publication

2003

Tracking Menus

We describe a new type of graphical user interface widget, known as a…

Publication

2014

Design problem solving with biological analogies: A verbal protocol study

Biomimetic design applies biological analogies to solve design…

Get in touch

Something pique your interest? Get in touch if you’d like to learn more about Autodesk Research, our projects, people, and potential collaboration opportunities.

Contact us