MathInstitutes.org

On the effectiveness of nonconvex policy optimization

Presenter

Yuxin Chen

August 4, 2021

ICERM

Event: VIRTUAL ONLY: Workshop on Advances in Theory and Algorithms for Deep Reinforcement Learning

Play Video

Abstract

Recent years have witnessed a flurry of activity in solving reinforcement learning problems via nonconvex optimization. While conventional wisdom often takes a dim view of nonconvex optimization algorithms due to their susceptibility to spurious local minima, simple first-order optimization methods have been remarkably successful in practice. The theoretical footings, however, had been largely lacking until recently. In this talk, we make progress towards understanding the efficacy of policy gradient type algorithms with softmax parameterization --- a family of nonconvex policy optimization algorithms widely used in modern reinforcement learning. On the one hand, we demonstrate that softmax policy gradient methods can take (super)-exponential time to converge, even in the presence of a benign initialization and an initial state distribution amenable to optimization. On the other hand, we show that employing natural policy gradients and enforcing entropy regularization allow for nearly dimension-free global linear convergence.

Supplementary Materials

Slides

Videos

On the effectiveness of nonconvex policy optimization

Presenter

Abstract

Supplementary Materials