In this paper, we propose a robust policy evaluation algorithm in reinforcement learning, to feature outlier contamination and heavy-tailed reward distributions. We further develop a fully-online method to conduct statistical inference for the modeling parameters. Our method converges faster to the minimum asymptotic variance than the classical temporal difference (TD) learning and avoids the selection of the step sizes. Numerical experiments are provided on the effectiveness of the proposed algorithm in real-world reinforcement learning experiments, which highlight the efficiency and robustness of our approach when compared to the existing online bootstrap method. This work is joint with Jiyuan Tu (SUFE), Xi Chen (NYU), and Weidong Liu (SJTU).

7月18日
4:00pm - 5:00pm
地點
Room 2302 (Lifts 17/18)
講者/表演者
Prof. Yichen ZHANG
Purdue University
主辦單位
Department of Mathematics
聯絡方法
付款詳情
對象
Alumni, Faculty and staff, PG students, UG students
語言
英語
其他活動
5月15日
研討會, 演講, 講座
IAS / School of Science Joint Lecture - Laser Spectroscopy of Computable Atoms and Molecules with Unprecedented Accuracy
Abstract Precision spectroscopy of the hydrogen atom, a fundamental two-body system, has been instrumental in shaping quantum mechanics. Today, advances in theory and experiment allow us to ext...
3月24日
研討會, 演講, 講座
IAS / School of Science Joint Lecture - Pushing the Limit of Nonlinear Vibrational Spectroscopy for Molecular Surfaces/Interfaces Studies
Abstract Surfaces and interfaces are ubiquitous in Nature. Sum-frequency generation vibrational spectroscopy (SFG-VS) is a powerful surface/interface selective and sub-monolayer sensitive spect...