forked from krrish94/sniffle-workshop
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
26 changed files
with
150 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
Recent empirical work shows that inconsistent results based on choice of hyperparameter optimization (HPO) configuration are a widespread problem in ML research. When comparing two algorithms J and K, searching one subspace can yield the conclusion that J outperforms K, whereas searching another can entail the opposite. In short, the way we choose hyperparameters can deceive us. In this talk, I will discuss work from NeurIPS 2020 in which we provide a theoretical complement to this prior empirical work, arguing that, to avoid such deception, the process of drawing conclusions from HPO should be made more rigorous. In this work, we name this process epistemic hyperparameter optimization (EHPO), and put forth a logical framework to capture its semantics and how it can lead to inconsistent conclusions about performance. Our framework enables us to prove EHPO methods that are guaranteed to be defended against deception, given a bounded compute time budget t. I will show how our framework is useful for proving and empirically validating a defended variant of random search, and close with broader takeaways concerning the future of robust HPO research. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
While sequential decision-making algorithms like reinforcement learning have been seeing much broader applicability and success, they are still often deployed under the assumption that the train and test set of MDPs are drawn IID from the same distribution. But in most real-world production systems, distribution shift is ubiquitous, and any system designed for real-world deployment must be able to handle this robustly. An RL agent deployed in the wild must be robust to data distribution shifts arising from the diversity and dynamism of the real world. In this talk, I will describe two scenarios where such data distribution shifts can occur: (i) offline reinforcement learning and (ii) meta reinforcement learning. In both scenarios, I will discuss how dealing with distribution shift requires careful training of dynamic, adaptive policies that can infer and adapt to varying levels of distribution shift. This allows agents to go beyond the standard requirement of train and test distribution matching and show improvement in scenarios with significant distribution shifts. I will discuss how this framework will allow us to build adaptive and robust simulated robotics systems. | ||
|
||
Relevant papers: | ||
|
||
(1) Offline RL policies should be trained to be adaptive (ICML 2022) | ||
(2) Distributionally Adaptive Meta RL (NeurIPS 2022) | ||
(3) Is conditional generative modeling all you need for decision making? (FMDM Workshop NeurIPS 2022) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
The Markov assumption is pervasive in reinforcement learning. By modeling problems as Markov decision processes, agents act as though they can always observe the complete state of the world. While this assumption is sometimes a useful fiction, in general decision processes, agents must find ways to cope with only partial information. Classical techniques for partial observability typically require access to unobservable or hard-to-acquire information (like the complete set of possible world states, or knowledge of mutually exclusive potential futures). Meanwhile, modern recurrent neural networks, which rely only on observables and simple forms of memory, have proven remarkably effective in practice, but lack a principled theoretical framework for understanding when and what agents should remember. And yet---despite its flaws---the Markov assumption may offer a path towards precisely this type of understanding. We show that estimating the value of the agent's policy both with and without the Markov assumption leads to a value discrepancy in non-Markov environments that appears to reliably indicate when memory is useful. We present initial progress towards a theory of such value discrepancies, and sketch an algorithm for automatically learning memory functions by uncovering and subsequently minimizing those discrepancies. Our approach suggests that agents can make effective decisions in general decision processes as long as they remember whatever information is necessary for them to trust their value function estimates. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
Reward is the driving force for reinforcement-learning agents. In this talk, I will present our recent NeurIPS paper that explores the expressivity of reward as a way to capture tasks that we would want an agent to perform. We frame this study around three new abstract notions of “task” that might be of interest: (1) a set of acceptable behaviors, (2) a partial ordering over behaviors, or (3) a partial ordering over trajectories. Our main results prove that while Markov reward can express many of these tasks, there exist instances of each task type that no Markov reward function can capture. We then provide a set of polynomial-time algorithms that construct a Markov reward function that allows an agent to optimize tasks of each of these three types, and correctly determine when no such reward function exists. I conclude by summarizing recent follow up work that studies alternatives for enriching the expressivity of reward. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
Following work on joint object-action representations, the functional object-oriented network (FOON) was introduced as a knowledge graph representation for robots. Taking the form of a bipartite graph, a FOON contains symbolic (high-level) concepts pertinent to a robot's understanding of its environment and tasks in a way that mirrors human understanding of actions. However, little work has been done to demonstrate how task plans acquired from FOON can be used for task execution by a robot, as the concepts typically found in a FOON are too abstract for immediate execution. To address this, we incorporate a hierarchical task planning approach to translate a FOON graph into a PDDL-based representation of domain knowledge for manipulation planning. As a result of this process, a task plan can be acquired that a robot can execute from start to end, leveraging the use of action contexts and motion primitives in the form of dynamic movement primitives (DMP). Learned action contexts can then be extended to never-before-seen scenarios. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
Multi-agent Path Finding (MAPF) is the problem of finding paths for a set of agents to their goals without collisions between agents. It is an important problem for many multi-robot applications, especially in automated warehouses. MAPF has been well studied, with many algorithms developed to solve MAPF instances optimally. However, the characteristics of these algorithms are poorly understood. No single algorithm dominates the other algorithms and it is hard to determine which algorithm should be used for which instance and for which instances algorithms will struggle to find a solution. In this talk, I will present results from two papers that seek to better understand the performance of MAPF algorithms. The first part of the talk will cover our MAPF Algorithm SelecTor (MAPFAST), a deep learning approach to predicting which algorithm will perform the best on a given instance. The second part of the talk will cover the role the betweenness centrality of the environment plays on the empirical difficulty of MAPF instances. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
What properties of MDP allow for generalization in reinforcement learning? How does representation learning help with this? Even though we understand the analogous questions in supervised learning, providing theory for understanding these fundamental questions in reinforcement learning is challenging. | ||
|
||
|
||
In this talk, we will discuss a number of recent works on the statistical and computational views of these questions. We will start by exploring this from a statistical point of view, where we will see algorithmic ideas for sample efficient reinforcement learning. Then, we will move on to the computational land and give evidence that the computational and statistical views of RL are fundamentally different by showing a surprising computational-statistical gap in reinforcement learning. Along the way, we will make progress on one of the most fundamental questions in reinforcement learning with linear function approximation: Suppose the optimal value function (Q* or V*) is linear in a given d dimensional feature mapping, is efficient reinforcement learning possible? |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
Reinforcement learning has achieved recent success in a number of difficult, high-dimensional environments. However, these methods generally require millions of samples from the environment to learn optimal behaviors, limiting their real-world applicability. Hence this work is aimed at creating a principled framework for lifelong agents to learn essential skills and be able to combine them to solve new compositional tasks without further learning. To achieve this, we design useful representations of skills for each task and we construct a Boolean algebra over the set of tasks and skills. This enables us to compose learned skills to immediately solve new tasks that are expressible as a logical composition of past tasks. We present theoretical guarantees for our framework and demonstrate its usefulness for lifelong learning via a number of experiments. |
Oops, something went wrong.