10月17日
研讨会, 演讲, 讲座
Department of Mathematics - Seminar on Statistics and Data Science - Exponential Lower Bounds and Fast Convergence for Policy Optimization
Policy gradient (PG) methods and their variants lie at the heart of modern reinforcement learning. Due to the intrinsic non-concavity of value maximization, however, the theoretical underpinnings of PG-type methods have been limited even until recently.