Videos

A Lyapunov approach for finite-sample convergence bounds with off-policy RL

Presenter
August 3, 2021
Abstract
In this talk, we derive finite-sample bounds for Markovian Stochastic Approximation, using the generalized Moreau envelope as a Lyapunov function. This result, we show, enables us to derive finite-sample bounds for a large class of value-based asynchronous reinforcement learning (RL) algorithms. Specifically, we show finite-sample mean-square convergence bounds for asynchronous RL algorithms such as Q-learning, n-step TD, TD(lambda), and off-policy TD algorithms including V-trace. As a by-product, by analyzing the convergence bounds of n-step TD and TD(lambda), we provide theoretical insights into the bias-variance trade-off, i.e., efficiency of bootstrapping in RL. Based on joint work with Zaiwei Chen, Siva Theja Maguluri and Karthikeyan Shanmugam.