Preference based RL with finite time guarantees
Presenter
August 2, 2021
Abstract
As reinforcement learning is used for solving increasingly complex problems, eliciting meaningful labels and rewards for supervision is becomes challenging. Preferences in the form of pairwise comparisons have emerged as an alternate feedback mechanism that are often easier to elicit and more accurate. Despite promising results in applications, the theoretical understanding of preference based RL is still in its infancy. This talk will outline our efforts in understanding the fundamental limits of learning when given access to both preferences and labels, algorithms that achieve those limits and some open questions.