17 Oct 2022
Seminar, Lecture, Talk
Department of Mathematics - Seminar on Statistics and Data Science - Exponential Lower Bounds and Fast Convergence for Policy Optimization
Policy gradient (PG) methods and their variants lie at the heart of modern reinforcement learning. Due to the intrinsic non-concavity of value maximization, however, the theoretical underpinnings of PG-type methods have been limited even until recently.