Reinforcement learning with factorization
Presenter
August 3, 2021
Abstract
In the setup of a single-agent reinforcement learning viewed through the framework of Markov Decision Process, typically there are two primary challenges: (a) given access to the model or simulator), learning a good or optimal policy, and (b) identifying the model, using limited observed data potentially generated under sub-optimal and unknown policy.
Like the singular value decomposition of a matrix, the spectral decomposition or factorization of a “nice” multi-variate function suggests that it can be represented as a finite or countably infinite sum of product of functions of individual variables. In this talk, we shall discuss how factorization of Q-function can help design sample efficient learning with access to model simulator, and how the factorization of transition kernel can help learn the model from a single trajectory per agent in the setting of offline reinforcement learning with heterogenous agents.
This is based on joint works:
Q-function learning: https://arxiv.org/abs/2006.06135
Offline personalized model learning: https://arxiv.org/abs/2102.06961
"