diff --git a/pasttalks.md b/pasttalks.md index 1148884..b63f7ea 100644 --- a/pasttalks.md +++ b/pasttalks.md @@ -92,3 +92,111 @@ permalink: /pasttalks/ + +

2022

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
TalkSpeaker
Handling Distribution Shifts by training RL agents to be adaptive [abstract][recording]Anurag Ajay
Resource Optimization for Learning in Robotics [abstract][recording]Shivam Vats
Towards understanding self-supervised representation learning [abstract][recording]Nikunj Saunshi
Integrating Psychophysiological Measurements with Robotics in Dynamic Environments [abstract][recording]Pooja Bovard And Courtney Tse
Learning and Memory in General Decision Processes [abstract][recording]Cam Allen
Creating Versatile Learning Agents Via Lifelong Compositionality [abstract][recording]Jorge Mendez
Learning Scalable Strategies for Swarm Robotic Systems [abstract][recording]Lishuo Pan
Dynamic probabilistic logic models for effective task-specific abstractions in RL [abstract][recording]Harsha Kokel
Why is this Taking so Dang Long? The Performance Characteristics of Multi-agent Path Finding Algorithms [abstract][recording]Eric Ewing
Learning-Augmented Anticipatory Planning: designing capable and trustworthy robots that plan despite missing knowledge [abstract][recording]Gregory Stein
Towards Lifelong Reinforcement Learning through Zero-Shot Logical Composition [abstract][recording]Geraud Nangue Tasse
WHAT IS ARiSE AND ITS PURPOSE, HOW DOES IT BENEFIT HAMPTON UNIVERSITY, LOCAL MILITARY/GOVERNMENT AGENCIES, UAS INDUSTRIES, AND THE COMMUNITIES OF HAMPTON ROADS [abstract][recording]John P Murray
Learning and Using Hierarchical Abstractions for Efficient Taskable Robots [abstract][recording]Naman Shah
Representation in Robotics [abstract][recording]Kaiyu Zheng
Toward More Robust Hyperparameter Optimization [abstract][recording]A. Feder Cooper
Statistical and Computational Issues in Reinforcement Learning (with Linear Function Approximation) [abstract][recording]Gaurav Mahajan
On the Expressivity of Markov Reward [abstract][recording]Dave Abel
Robot Skill Learning via Representation Sharing and Reward Conditioning[abstract] & Shape-Based Transfer of Generic Skills [abstract][recording]Tuluhan Akbulut & Skye Thompson
Hardware Architecture for LiDAR Point Cloud Processing in Autonomous Driving [abstract][recording]Xinming Huang
Working with Spot [abstract] & Count based exploration [abstract][recording]Kaiyu Zheng & Max Merlin & Sam Lobel
MICo: Improved representations via sampling-based state similarity for Markov decision processes [abstract][recording]Pablo Samuel Castro
Weak inductive biases for composable primitive representations [abstract][recording]Wilka Carvalho
Mirror Descent Policy Optimization [abstract][recording]Manan Tomar
Joint Task and Motion Planning with the Functional Object-Oriented Network [abstract][recording]David Paulius
diff --git a/pasttalks/abstracts/afedercooper.txt b/pasttalks/abstracts/afedercooper.txt new file mode 100644 index 0000000..f032e51 --- /dev/null +++ b/pasttalks/abstracts/afedercooper.txt @@ -0,0 +1 @@ +Recent empirical work shows that inconsistent results based on choice of hyperparameter optimization (HPO) configuration are a widespread problem in ML research. When comparing two algorithms J and K, searching one subspace can yield the conclusion that J outperforms K, whereas searching another can entail the opposite. In short, the way we choose hyperparameters can deceive us. In this talk, I will discuss work from NeurIPS 2020 in which we provide a theoretical complement to this prior empirical work, arguing that, to avoid such deception, the process of drawing conclusions from HPO should be made more rigorous. In this work, we name this process epistemic hyperparameter optimization (EHPO), and put forth a logical framework to capture its semantics and how it can lead to inconsistent conclusions about performance. Our framework enables us to prove EHPO methods that are guaranteed to be defended against deception, given a bounded compute time budget t. I will show how our framework is useful for proving and empirically validating a defended variant of random search, and close with broader takeaways concerning the future of robust HPO research. diff --git a/pasttalks/abstracts/anuragajay.txt b/pasttalks/abstracts/anuragajay.txt new file mode 100644 index 0000000..d25e60d --- /dev/null +++ b/pasttalks/abstracts/anuragajay.txt @@ -0,0 +1,7 @@ +While sequential decision-making algorithms like reinforcement learning have been seeing much broader applicability and success, they are still often deployed under the assumption that the train and test set of MDPs are drawn IID from the same distribution. But in most real-world production systems, distribution shift is ubiquitous, and any system designed for real-world deployment must be able to handle this robustly. An RL agent deployed in the wild must be robust to data distribution shifts arising from the diversity and dynamism of the real world. In this talk, I will describe two scenarios where such data distribution shifts can occur: (i) offline reinforcement learning and (ii) meta reinforcement learning. In both scenarios, I will discuss how dealing with distribution shift requires careful training of dynamic, adaptive policies that can infer and adapt to varying levels of distribution shift. This allows agents to go beyond the standard requirement of train and test distribution matching and show improvement in scenarios with significant distribution shifts. I will discuss how this framework will allow us to build adaptive and robust simulated robotics systems. + +Relevant papers: + +(1) Offline RL policies should be trained to be adaptive (ICML 2022) +(2) Distributionally Adaptive Meta RL (NeurIPS 2022) +(3) Is conditional generative modeling all you need for decision making? (FMDM Workshop NeurIPS 2022) \ No newline at end of file diff --git a/pasttalks/abstracts/camallen.txt b/pasttalks/abstracts/camallen.txt new file mode 100644 index 0000000..1e28a49 --- /dev/null +++ b/pasttalks/abstracts/camallen.txt @@ -0,0 +1 @@ +The Markov assumption is pervasive in reinforcement learning. By modeling problems as Markov decision processes, agents act as though they can always observe the complete state of the world. While this assumption is sometimes a useful fiction, in general decision processes, agents must find ways to cope with only partial information. Classical techniques for partial observability typically require access to unobservable or hard-to-acquire information (like the complete set of possible world states, or knowledge of mutually exclusive potential futures). Meanwhile, modern recurrent neural networks, which rely only on observables and simple forms of memory, have proven remarkably effective in practice, but lack a principled theoretical framework for understanding when and what agents should remember. And yet---despite its flaws---the Markov assumption may offer a path towards precisely this type of understanding. We show that estimating the value of the agent's policy both with and without the Markov assumption leads to a value discrepancy in non-Markov environments that appears to reliably indicate when memory is useful. We present initial progress towards a theory of such value discrepancies, and sketch an algorithm for automatically learning memory functions by uncovering and subsequently minimizing those discrepancies. Our approach suggests that agents can make effective decisions in general decision processes as long as they remember whatever information is necessary for them to trust their value function estimates. \ No newline at end of file diff --git a/pasttalks/abstracts/daveabel.txt b/pasttalks/abstracts/daveabel.txt new file mode 100644 index 0000000..3340147 --- /dev/null +++ b/pasttalks/abstracts/daveabel.txt @@ -0,0 +1 @@ +Reward is the driving force for reinforcement-learning agents. In this talk, I will present our recent NeurIPS paper that explores the expressivity of reward as a way to capture tasks that we would want an agent to perform. We frame this study around three new abstract notions of “task” that might be of interest: (1) a set of acceptable behaviors, (2) a partial ordering over behaviors, or (3) a partial ordering over trajectories. Our main results prove that while Markov reward can express many of these tasks, there exist instances of each task type that no Markov reward function can capture. We then provide a set of polynomial-time algorithms that construct a Markov reward function that allows an agent to optimize tasks of each of these three types, and correctly determine when no such reward function exists. I conclude by summarizing recent follow up work that studies alternatives for enriching the expressivity of reward. diff --git a/pasttalks/abstracts/davidpaulius.txt b/pasttalks/abstracts/davidpaulius.txt new file mode 100644 index 0000000..23425dd --- /dev/null +++ b/pasttalks/abstracts/davidpaulius.txt @@ -0,0 +1 @@ +Following work on joint object-action representations, the functional object-oriented network (FOON) was introduced as a knowledge graph representation for robots. Taking the form of a bipartite graph, a FOON contains symbolic (high-level) concepts pertinent to a robot's understanding of its environment and tasks in a way that mirrors human understanding of actions. However, little work has been done to demonstrate how task plans acquired from FOON can be used for task execution by a robot, as the concepts typically found in a FOON are too abstract for immediate execution. To address this, we incorporate a hierarchical task planning approach to translate a FOON graph into a PDDL-based representation of domain knowledge for manipulation planning. As a result of this process, a task plan can be acquired that a robot can execute from start to end, leveraging the use of action contexts and motion primitives in the form of dynamic movement primitives (DMP). Learned action contexts can then be extended to never-before-seen scenarios. \ No newline at end of file diff --git a/pasttalks/abstracts/ericewing.txt b/pasttalks/abstracts/ericewing.txt new file mode 100644 index 0000000..37ae24b --- /dev/null +++ b/pasttalks/abstracts/ericewing.txt @@ -0,0 +1 @@ +Multi-agent Path Finding (MAPF) is the problem of finding paths for a set of agents to their goals without collisions between agents. It is an important problem for many multi-robot applications, especially in automated warehouses. MAPF has been well studied, with many algorithms developed to solve MAPF instances optimally. However, the characteristics of these algorithms are poorly understood. No single algorithm dominates the other algorithms and it is hard to determine which algorithm should be used for which instance and for which instances algorithms will struggle to find a solution. In this talk, I will present results from two papers that seek to better understand the performance of MAPF algorithms. The first part of the talk will cover our MAPF Algorithm SelecTor (MAPFAST), a deep learning approach to predicting which algorithm will perform the best on a given instance. The second part of the talk will cover the role the betweenness centrality of the environment plays on the empirical difficulty of MAPF instances. diff --git a/pasttalks/abstracts/gauravmahajan.txt b/pasttalks/abstracts/gauravmahajan.txt new file mode 100644 index 0000000..aed0b09 --- /dev/null +++ b/pasttalks/abstracts/gauravmahajan.txt @@ -0,0 +1,4 @@ +What properties of MDP allow for generalization in reinforcement learning? How does representation learning help with this? Even though we understand the analogous questions in supervised learning, providing theory for understanding these fundamental questions in reinforcement learning is challenging. + + +In this talk, we will discuss a number of recent works on the statistical and computational views of these questions. We will start by exploring this from a statistical point of view, where we will see algorithmic ideas for sample efficient reinforcement learning. Then, we will move on to the computational land and give evidence that the computational and statistical views of RL are fundamentally different by showing a surprising computational-statistical gap in reinforcement learning. Along the way, we will make progress on one of the most fundamental questions in reinforcement learning with linear function approximation: Suppose the optimal value function (Q* or V*) is linear in a given d dimensional feature mapping, is efficient reinforcement learning possible? diff --git a/pasttalks/abstracts/geraudtasse.txt b/pasttalks/abstracts/geraudtasse.txt new file mode 100644 index 0000000..55da4cf --- /dev/null +++ b/pasttalks/abstracts/geraudtasse.txt @@ -0,0 +1 @@ +Reinforcement learning has achieved recent success in a number of difficult, high-dimensional environments. However, these methods generally require millions of samples from the environment to learn optimal behaviors, limiting their real-world applicability. Hence this work is aimed at creating a principled framework for lifelong agents to learn essential skills and be able to combine them to solve new compositional tasks without further learning. To achieve this, we design useful representations of skills for each task and we construct a Boolean algebra over the set of tasks and skills. This enables us to compose learned skills to immediately solve new tasks that are expressible as a logical composition of past tasks. We present theoretical guarantees for our framework and demonstrate its usefulness for lifelong learning via a number of experiments. \ No newline at end of file diff --git a/pasttalks/abstracts/gregorystein.txt b/pasttalks/abstracts/gregorystein.txt new file mode 100644 index 0000000..a3055a4 --- /dev/null +++ b/pasttalks/abstracts/gregorystein.txt @@ -0,0 +1 @@ +The next generation of service and assistive robots will need to operate under uncertainty, expected to complete tasks and perform well despite missing information about the state of the world or the needs of future agents. Many existing approaches turn to learning to overcome the challenges of planning under uncertainty, yet are often brittle and myopic, limiting their effectiveness. Our work introduces a model-based approach to long-horizon planning under uncertainty that augments (rather than replaces) planning with estimates from learning, allowing for both high-performance and reliability-by-design. In this talk, I will present a number of recent and ongoing projects that leverage our high-level planning abstraction to improve navigation and task planning in partially-mapped environments. I will additionally discuss how our high-level planning abstraction affords capabilities unique in this domain, including explanation generation, single-shot interventions (online behavior correction), and deployment-time reward estimator selection. Critically, I will present a unified perspective on these recent advances and show how they may be made compatible with STRIPS-like planning (e.g., via PDDL), and how our "learning-augmented PDDL" will allow for common-sense-like behaviors in both fully- and partially-revealed environments. \ No newline at end of file diff --git a/pasttalks/abstracts/harshakokel.txt b/pasttalks/abstracts/harshakokel.txt new file mode 100644 index 0000000..7052770 --- /dev/null +++ b/pasttalks/abstracts/harshakokel.txt @@ -0,0 +1 @@ +In many real-world domains, e.g., driving, the state space of offline planning is rather different from the state space of online execution. Inspired by such sequential decision-making problems, we propose a bi-level framework called RePReL, an integrated planning and RL framework. In RePReL, planning occurs offline, at the level of deciding the route, while execution occurs online and needs to take into account dynamic conditions on the road. The agent typically does not have access to the dynamic part of the state at the planning time nor does it have the computational resources to plan an optimal policy that works for all possible traffic events. The key principle that enables agents to deal with these informational and computational challenges is abstraction. In this talk, I argue that domain-specific knowledge can be leveraged to construct appropriate abstractions and propose a dynamic Statistical Relational Learning (SRL) language for the specification of task-specific abstraction. I present empirical results in various grid world domains and a robotic task to underline the significance of the proposed language for efficient learning and effective transfer across tasks. \ No newline at end of file diff --git a/pasttalks/abstracts/johnpmurray.txt b/pasttalks/abstracts/johnpmurray.txt new file mode 100644 index 0000000..4d82d33 --- /dev/null +++ b/pasttalks/abstracts/johnpmurray.txt @@ -0,0 +1 @@ +The School of Engineering and Technology, Principal Investigator, Aviation Department Chair, Mr. John Murray assisted in the pursuit of a per capita grant with Go Virginia, Department of Housing and Community Development (GoVA-DHCD). Hampton University is an organizing academia Member of the newly formed ARISE Alliance. Deadline For GoVA Submission was July 30, 2020. The Virginia’s Autonomous Workforce and Economic Development Program (VADP) is a collaborative partnership between educational institutions, localities, government agencies, industry, and nonprofits. This partnership will be governed by the Autonomy Research Institute for Societal Enhancement (ARISE) which is a new Virginia 501(c)(4) Non-profit Organization founded on May 14, 2020. The Alliance has a three-part mission 1) to conduct research, education, training and workforce development in Air-Sea-Land Unmanned Systems and Autonomous Traffic Management (UTM), including Urban/Advanced Air Mobility (UAM/AAM) and related Artificial Intelligence/Machine Learning (AI/ML) principles, 2) to educate students and citizens, and 3) to provide technological advisory service and legal insights to policy makers, industry, and the public on matters relating to Autonomy and Smart City Mobility. ARISE will be based on Fort Monroe in the lower Chesapeake Bay, a location ideal for technology development, testing, training, and implementation in a complex heterogeneous urban environment, ARISE and VADP aim to serve all Hampton Roads localities and throughout the Commonwealth and the Nation. The HU portion of this program will require release time as specified in the university’s Matching Contribution, submitted for the GoVA ARISE-VADP application. There are no current space requirements for this proposal. Indirect Cost Rate for this program still needs to be determined. There is no Technology/Equipment/Instrumentation included in the budget. No travel costs are required for this proposal. diff --git a/pasttalks/abstracts/jorgemendez.txt b/pasttalks/abstracts/jorgemendez.txt new file mode 100644 index 0000000..9150f18 --- /dev/null +++ b/pasttalks/abstracts/jorgemendez.txt @@ -0,0 +1,2 @@ +In order to be deployed long-term in the real world, machine learning systems must be able to handle the one thing that is constant: change. Traditional machine learning focuses on stationary tasks, and is therefore incapable of adapting to change. Instead, we need lifelong learners that can accumulate knowledge that enables them to rapidly adjust to new tasks. Ideally, pieces of this accumulated knowledge could be composed in different ways to adapt to the shifting environment. This capability would dramatically improve the performance of machine learning systems in dynamic environments: hate-speech detection models could adapt to social media trends, search-and-rescue robots could handle novel disasters, and student feedback software could adjust to new cohorts. +My research develops algorithms for lifelong or continual learning that leverage the intuition that accumulated knowledge should be compositional. In this talk, I will walk through the lifelong learning problem and motivate the search for compositional knowledge. My talk will then dive into some of the algorithms that I have developed for lifelong supervised and reinforcement learning, and I will show that these methods enable far improved lifelong learning in settings where tasks are highly diverse. I will finally briefly discuss some of the open problems in the field and describe my vision for how my future research will address these problems. diff --git a/pasttalks/abstracts/kaiyuzheng.txt b/pasttalks/abstracts/kaiyuzheng.txt new file mode 100644 index 0000000..2744372 --- /dev/null +++ b/pasttalks/abstracts/kaiyuzheng.txt @@ -0,0 +1,3 @@ +People often bet the future progress of robotics on having better representations. I am curious why existing representations seem insufficient, and how people pursue better ones. Driven by this, I focus on reviewing the representations used in two specific robotics domains: object manipulation in unstructured environments, and human-robot interaction for teamwork. + +Note that I had little time to prepare, and the areas are new to me. I take this as an opportunity to make myself learn more and address the questions, which I do think are important. Hope we can discuss and have fun! diff --git a/pasttalks/abstracts/lishuopan.txt b/pasttalks/abstracts/lishuopan.txt new file mode 100644 index 0000000..551c84e --- /dev/null +++ b/pasttalks/abstracts/lishuopan.txt @@ -0,0 +1 @@ +Swarm robotic systems are a novel approach to the coordination of large numbers of robots. Swarms bring desirable properties like robustness, flexibility, and scalability to robotic systems. These systems could be deployed to applications such as on-demand wireless networks, distributed mapping, large-scale localization, environmental monitoring, etc. Swarm robotics differentiates itself from multi-robot systems in the scale of robot group size and simplicity of individual robots to emerge collective intelligence. Incapability and communication limitation of individual robots introduce the challenge for scalable control strategy design. In this talk, I aim to present my previous work on learning a decentralized, scalable strategy using knowledge-based neural ordinary differential equations. In the second part, I will present an approach to scalable environmental monitoring, which is empirically proven robust to robot and communication failure, using multi-agent reinforcement learning. \ No newline at end of file diff --git a/pasttalks/abstracts/manantomar.txt b/pasttalks/abstracts/manantomar.txt new file mode 100644 index 0000000..9f86b54 --- /dev/null +++ b/pasttalks/abstracts/manantomar.txt @@ -0,0 +1 @@ +Mirror descent (MD), a well-known first-order method in constrained convex optimization, has recently been shown as an important tool to analyze trust-region algorithms in reinforcement learning (RL). However, there remains a considerable gap between such theoretically analyzed algorithms and the ones used in practice. Inspired by this, we propose an efficient RL algorithm, called mirror descent policy optimization (MDPO). MDPO iteratively updates the policy by approximately solving a trust-region problem, whose objective function consists of two terms: a linearization of the standard RL objective and a proximity term that restricts two consecutive policies to be close to each other. Each update performs this approximation by taking multiple gradient steps on this objective function. We derive on-policy and off-policy variants of MDPO, while emphasizing important design choices motivated by the existing theory of MD in RL. We highlight the connections between on-policy MDPO and two popular trust-region RL algorithms: TRPO and PPO, and show that explicitly enforcing the trust-region constraint is in fact not a necessity for high performance gains in TRPO. We then show how the popular soft actor-critic (SAC) algorithm can be derived by slight modifications of off-policy MDPO. Overall, MDPO is derived from the MD principles, offers a unified approach to viewing a number of popular RL algorithms, and performs better than or on-par with TRPO, PPO, and SAC in a number of continuous and discrete control tasks. Finally, I will end with a high level discussion on recent progress made by other papers on this topic. \ No newline at end of file diff --git a/pasttalks/abstracts/max&kaiyu.txt b/pasttalks/abstracts/max&kaiyu.txt new file mode 100644 index 0000000..7390a77 --- /dev/null +++ b/pasttalks/abstracts/max&kaiyu.txt @@ -0,0 +1 @@ +In this presentation, we will provide a basic introduction of Spot, which make up two new members of our lab. This includes an overview of Spot's mechanics and hardware, off-the-shelf features, how to work with Spot and safety protocols, as well as software development. Hopefully after this talk, you will have a more concrete understanding of Spot's capabilities. This is also informative for those who don't work with robots to understand some unique challenges present in robotics. \ No newline at end of file diff --git a/pasttalks/abstracts/nikunjsaunshi.txt b/pasttalks/abstracts/nikunjsaunshi.txt new file mode 100644 index 0000000..71822c9 --- /dev/null +++ b/pasttalks/abstracts/nikunjsaunshi.txt @@ -0,0 +1,3 @@ +While supervised learning sparked the deep learning boom, it has some critical shortcomings: (1) it requires an abundance of expensive labeled data, and (2) it solves tasks from scratch rather than the human-like approach of leveraging knowledge and skills acquired from prior experiences. Pre-training has emerged as an alternative and effective paradigm, to overcome these shortcomings, whereby a model is first trained using easily acquirable data, and later used to solve downstream tasks of interest with much fewer labeled data than supervised learning. Pre-training using unlabeled data, a.k.a. self-supervised learning has been especially revolutionary, with successes in diverse domains: text, vision, speech, etc. This raises an interesting and challenging question: why should pre-training on unlabeled data help with seemingly unrelated downstream tasks? + +In this talk I will talk about works that initiate and build a theoretical framework to study why self-supervised learning is beneficial for downstream tasks. The framework is applied to methods like contrastive learning, auto-regressive language modeling and self-prediction based methods. Central to the framework is the idea that pre-training helps learn low-dimensional representations of data, that subsequently help solve downstream tasks of interest with linear classifiers, requiring fewer labeled data. A common theme is to formalize what are desirable properties of the unlabeled data distribution that is used to construct the self-supervised learning task. Under appropriate formalizations, it can be shown that approximately minimizing the right pre-training objectives can extract the downstream signal that is implicitly encoded in the unlabeled data distribution. Finally it is shown that this signal can be decoded from the learned representations using linear classifiers, thus providing a formalization for transference of “skills and knowledge” across tasks. diff --git a/pasttalks/abstracts/pablocastro.txt b/pasttalks/abstracts/pablocastro.txt new file mode 100644 index 0000000..13c4ebd --- /dev/null +++ b/pasttalks/abstracts/pablocastro.txt @@ -0,0 +1 @@ +We present a new behavioral distance over the state space of a Markov decision process, and demonstrate the use of this distance as an effective means of shaping the learnt representations of deep reinforcement learning agents. While existing notions of state similarity are typically difficult to learn at scale due to high computational cost and lack of sample-based algorithms, our newly-proposed distance addresses both of these issues. In addition to providing detailed theoretical analysis, we provide empirical evidence that learning this distance alongside the value function yields structured and informative representations, including strong results on the Arcade Learning Environment benchmark. \ No newline at end of file diff --git a/pasttalks/abstracts/poojaandcourtney.txt b/pasttalks/abstracts/poojaandcourtney.txt new file mode 100644 index 0000000..dd14336 --- /dev/null +++ b/pasttalks/abstracts/poojaandcourtney.txt @@ -0,0 +1 @@ +Recent research at Draper has studied the role of trust in human-machine teaming. Ocular metrics using eye tracking technology have been shown to be reliable indicators of physiological measures such as fatigue, attention, and stress. However, current literature in this area lacks testing with electrooculography (EOG) technology and in dynamic environments. Using EOG glasses developed by the MIT Media Lab, we plan to improve the accuracy of certain psychophysiological measurements in dynamic environments, such as in a military context. Currently in the early stages of this new project, we are open to collaboration and developing a direction of how these measurements may be integrated with robotics to further research in other areas \ No newline at end of file diff --git a/pasttalks/abstracts/samlobel.txt b/pasttalks/abstracts/samlobel.txt new file mode 100644 index 0000000..861fdbe --- /dev/null +++ b/pasttalks/abstracts/samlobel.txt @@ -0,0 +1 @@ +Count-based exploration can lead to optimal reinforcement learning in small tabular domains. But, it is challenging to keep track of visitation counts in environments with large state spaces. Previous work in this area has converted the problem of learning visitation counts to that of learning a restrictive form of a density model over the state-space. Rather than optimizing a surrogate objective, our proposed algorithm constructs a standard, MSE-based optimization procedure that regresses to a state's visitation count. The one-sentence summary of how: we notice that the variance of the sample-mean of random variables scales with the inverse count, and target that with optimization. Compared to previous work, we show that our method is significantly more effective at deducing ground truth visitation frequencies; when used as an exploration bonus for a model-free reinforcement learning algorithm, our method outperforms existing approaches. \ No newline at end of file diff --git a/pasttalks/abstracts/shivamvats.txt b/pasttalks/abstracts/shivamvats.txt new file mode 100644 index 0000000..b4af147 --- /dev/null +++ b/pasttalks/abstracts/shivamvats.txt @@ -0,0 +1 @@ +Robot learning requires large amounts of training data that is often expensive and time-consuming to obtain. This is commonly sought to be addressed by designing more sample efficient learning methods. However, there is a large variation in the cost of acquiring different types of data in robotics. For example, human demonstrations are substantially more expensive than simulation time. In this talk, I will discuss our recently proposed algorithms that explicitly reason about the cost of acquiring data during learning so as to optimize the task performance while keeping costs down. Importantly, our algorithms choose not to train certain parts of the system if they are redundant or if the cost of training is not justified by the improvement in performance. I will discuss resource optimization in the context of collaborative manufacturing, where we show that it can bring down the cost of manufacturing and sequential manipulation under uncertainty where we are able to learn more robust and effective recovery skills using a given training budget. \ No newline at end of file diff --git a/pasttalks/abstracts/skyethompson1.txt b/pasttalks/abstracts/skyethompson1.txt new file mode 100644 index 0000000..82cb3c2 --- /dev/null +++ b/pasttalks/abstracts/skyethompson1.txt @@ -0,0 +1 @@ +We propose a new, data-efficient approach for skill transfer to novel objects, accounting for known categorical shape variation. A low-dimensional shape representation embedding is learned from a set of deformations, sampled between known objects within a category. This latent representation is mapped to a set of control parameters that result in successful execution of a category-level skill on that object. This method generalizes a learned manipulation policy to unseen objects with few training examples. We demonstrate this approach on pouring from cups and scooping with spatulas, where there is complex, nonlinear variation of successful control parameters across objects. \ No newline at end of file diff --git a/pasttalks/abstracts/tuluhanakbulut.txt b/pasttalks/abstracts/tuluhanakbulut.txt new file mode 100644 index 0000000..8bdcd01 --- /dev/null +++ b/pasttalks/abstracts/tuluhanakbulut.txt @@ -0,0 +1 @@ +Skill learning is a character trait of intelligent behavior, which Robot Learning aims to give to robots. An effective approach is to teach an initial version of the skill by demonstrating as a form of Supervised Learning (SL), called Learning from Demonstrations (LfD), then let the robot improve it and adapt to novel tasks via Reinforcement Learning (RL). In this talk, I will first introduce a novel LfD+RL framework, Adaptive Conditional Neural Movement Primitives (ACNMP), that simultaneously utilizes LfD and RL together during adaptation and makes demonstrations and RL guided trajectories share the same latent representation space. We show through simulation experiments that (i) ACNMP successfully adapts the skill using order of magnitude fewer trajectory samples than baselines; (ii) its simultaneous training method preserves the demonstration characteristics; (iii) ACNMP enables skill transfer between robots with different morphologies. Our real-world experiments verify the suitability of ACNMP in real-world applications where non-linearity and the number of dimensions increases. Next, we extend the idea of using Supervised Learning in reward-based skill learning tasks and propose our second framework called Reward Conditioned Neural Movement Primitives (RC-NMP), where learning is done using only Supervised Learning. RC-NMP takes rewards as input, generates trajectories conditioned on desired rewards. The model uses variational inference to create a stochastic latent representation space from where varying trajectories are sampled to create a trajectory population. Finally, the diversity of the population is increased using crossover and mutation operations from Evolutionary Strategies to handle environments with sparse rewards, multiple solutions, or local minima. RC-NMP samples trajectories from high-reward landscapes and progressively finds better trajectories. Our simulation and real-world experiments show that RC-NMP is more stable and efficient than ACNMP and two other robotic RL algorithms. diff --git a/pasttalks/abstracts/wilkacarvalho.txt b/pasttalks/abstracts/wilkacarvalho.txt new file mode 100644 index 0000000..0b93634 --- /dev/null +++ b/pasttalks/abstracts/wilkacarvalho.txt @@ -0,0 +1,4 @@ +To generalize across object-centric tasks, a reinforcement learning (RL) agent needs to exploit the structure that objects induce. However, it's not clear how to incorporate objects into an agent's architecture or objective function in a flexible way. Prior work has either hard-coded object-centric features or used inductive biases with strong assumptions. However, these approaches have had limited success in enabling general RL agents. Part of what gives objects their utility is that they enable an agent to break up and recombine its experience. Motivated by this, we propose “separate and integrate”, a motif for weak inductive biases aimed at enabling an agent to break up and recombine its basic computations: estimating state, predicting value, etc. + + +We present initial results with "Feature-Attending Recurrent Modules” (FARM), an architecture that separates and integrates state across multiple state modules. Additionally, each module attends to spatiotemporal features with an expressive feature attention mechanism. This enables FARM to represent diverse object-induced spatial and temporal regularities across subsets of modules. We hypothesize that this enables an RL agent to flexibly recombine its experiences to generalize across object-centric tasks. We study task suites in both 2D and 3D environments and find that FARM better generalizes compared to competing architectures that leverage attention or multiple modules. diff --git a/pasttalks/abstracts/xinminghuang.txt b/pasttalks/abstracts/xinminghuang.txt new file mode 100644 index 0000000..a3ebacb --- /dev/null +++ b/pasttalks/abstracts/xinminghuang.txt @@ -0,0 +1 @@ +Research on autonomous vehicle technology has been growing rapidly in recent years. This talk presents our recent works on LiDAR point cloud processing for the perception tasks in autonomous driving, including point cloud classification, semantic segmentation, panoptic segmentation and 3D point cloud depth completion. The unique research contribution is to combine the traditional computer vision algorithms with deep learning models such that the approach can achieve state-of-the-art performance at much lower complexity. Furthermore, those efficient network models are targeted on the GPU and/or FPGA hardware platforms to demonstrate real-time processing for autonomous vehicles. Most of the research results are evaluated using the existing KITTI dataset. The research team also built a full-size autonomous vehicle prototype for data collection and experimentation. \ No newline at end of file