Videos

On the effectiveness of nonconvex policy optimization

Presenter
August 4, 2021
Abstract
Recent years have witnessed a flurry of activity in solving reinforcement learning problems via nonconvex optimization. While conventional wisdom often takes a dim view of nonconvex optimization algorithms due to their susceptibility to spurious local minima, simple first-order optimization methods have been remarkably successful in practice. The theoretical footings, however, had been largely lacking until recently. In this talk, we make progress towards understanding the efficacy of policy gradient type algorithms with softmax parameterization --- a family of nonconvex policy optimization algorithms widely used in modern reinforcement learning. On the one hand, we demonstrate that softmax policy gradient methods can take (super)-exponential time to converge, even in the presence of a benign initialization and an initial state distribution amenable to optimization. On the other hand, we show that employing natural policy gradients and enforcing entropy regularization allow for nearly dimension-free global linear convergence.
Supplementary Materials