In this paper, we propose a robust policy evaluation algorithm in reinforcement learning, to feature outlier contamination and heavy-tailed reward distributions. We further develop a fully-online method to conduct statistical inference for the modeling parameters. Our method converges faster to the minimum asymptotic variance than the classical temporal difference (TD) learning and avoids the selection of the step sizes. Numerical experiments are provided on the effectiveness of the proposed algorithm in real-world reinforcement learning experiments, which highlight the efficiency and robustness of our approach when compared to the existing online bootstrap method. This work is joint with Jiyuan Tu (SUFE), Xi Chen (NYU), and Weidong Liu (SJTU).

7月18日
4:00pm - 5:00pm
地点
Room 2302 (Lifts 17/18)
讲者/表演者
Prof. Yichen ZHANG
Purdue University
主办单位
Department of Mathematics
联系方法
付款详情
对象
Alumni, Faculty and staff, PG students, UG students
语言
英语
其他活动
5月11日
研讨会, 演讲, 讲座
IAS / School of Science Joint Lecture - Regioselective Pyridine C-H-Functionalization and Skeletal Editing
Abstract Pyridines belong to the most abundant heteroarenes in medicinal chemistry and in agrochemical industry. In the lecture, highly regioselective pyridine C-H functionalization through a d...
1月20日
研讨会, 演讲, 讲座
IAS / School of Science Joint Lecture - A Journey to Defect Science and Engineering
Abstract A defect in a material is one of the most important concerns when it comes to modifying and tuning the properties and phenomena of materials. The speaker will review his stud...