MathInstitutes.org

Curbing Our Enthusiasm: Constraining Decision Policies Learned from the Past to Ensure Good Futures

Presenter

Emma Brunskill

February 25, 2020

Curbing Our Enthusiasm: Constraining Decision Policies Learned from the Past to Ensure Good Futures Thumbnail

Abstract

Emma Brunskill - Stanford University There is growing interest in batch off policy RL, spurred in part by the vast datasets of prior decisions and their outcomes. Yet off policy RL can be challenging, with well known divergence results. In this talk I'll summarize some of our work in this area to tackle off policy evaluation and off policy optimization, including a structural minimization style result for guaranteeing future performance, and practical algorithms that we have used to quickly learn personalized policies from historical data for a high fidelity diabetes simulator.

Abstract

Videos

Curbing Our Enthusiasm: Constraining Decision Policies Learned from the Past to Ensure Good Futures

Presenter

Abstract