Videos

Abstract
In the setup of a single-agent reinforcement learning viewed through the framework of Markov Decision Process, typically there are two primary challenges: (a) given access to the model or simulator), learning a good or optimal policy, and (b) identifying the model, using limited observed data potentially generated under sub-optimal and unknown policy. Like the singular value decomposition of a matrix, the spectral decomposition or factorization of a “nice” multi-variate function suggests that it can be represented as a finite or countably infinite sum of product of functions of individual variables. In this talk, we shall discuss how factorization of Q-function can help design sample efficient learning with access to model simulator, and how the factorization of transition kernel can help learn the model from a single trajectory per agent in the setting of offline reinforcement learning with heterogenous agents. This is based on joint works: Q-function learning: https://arxiv.org/abs/2006.06135 Offline personalized model learning: https://arxiv.org/abs/2102.06961 "