In this paper, we propose a robust policy evaluation algorithm in reinforcement learning, to feature outlier contamination and heavy-tailed reward distributions. We further develop a fully-online method to conduct statistical inference for the modeling parameters. Our method converges faster to the minimum asymptotic variance than the classical temporal difference (TD) learning and avoids the selection of the step sizes. Numerical experiments are provided on the effectiveness of the proposed algorithm in real-world reinforcement learning experiments, which highlight the efficiency and robustness of our approach when compared to the existing online bootstrap method. This work is joint with Jiyuan Tu (SUFE), Xi Chen (NYU), and Weidong Liu (SJTU).

7月18日
4:00pm - 5:00pm
地点
Room 2302 (Lifts 17/18)
讲者/表演者
Prof. Yichen ZHANG
Purdue University
主办单位
Department of Mathematics
联系方法
付款详情
对象
Alumni, Faculty and staff, PG students, UG students
语言
英语
其他活动
10月10日
研讨会, 演讲, 讲座
IAS / School of Science Joint Lecture - Use of Large Animal Models to Investigate Brain Diseases
Abstract Genetically modified animal models have been extensively used to investigate the pathogenesis of age-dependent neurodegenerative diseases, such as Alzheimer (AD), Parkinson (PD), Hunti...
7月14日
研讨会, 演讲, 讲座
IAS / School of Science Joint Lecture - Boron Clusters
Abstract The study of carbon clusters led to the discoveries of fullerenes, carbon nanotubes, and graphene. Are there other elements that can form similar nanostructures? To answer this questio...