In this paper, we propose a robust policy evaluation algorithm in reinforcement learning, to feature outlier contamination and heavy-tailed reward distributions. We further develop a fully-online method to conduct statistical inference for the modeling parameters. Our method converges faster to the minimum asymptotic variance than the classical temporal difference (TD) learning and avoids the selection of the step sizes. Numerical experiments are provided on the effectiveness of the proposed algorithm in real-world reinforcement learning experiments, which highlight the efficiency and robustness of our approach when compared to the existing online bootstrap method. This work is joint with Jiyuan Tu (SUFE), Xi Chen (NYU), and Weidong Liu (SJTU).

18 Jul 2023
4:00pm - 5:00pm
Where
Room 2303 (Lifts 17/18)
Speakers/Performers
Prof. Yichen ZHANG
Purdue University
Organizer(S)
Department of Mathematics
Contact/Enquiries
Payment Details
Audience
Alumni, Faculty and staff, PG students, UG students
Language(s)
English
Other Events
6 Jan 2026
Seminar, Lecture, Talk
IAS / School of Science Joint Lecture - Innovations in Organo Rare-Earth and Titanium Chemistry: From Self-Healing Polymers to N2 Activation
Abstract In this lecture, the speaker will introduce their recent studies on the development of innovative organometallic complexes and catalysts aimed at realizing unprecedented chem...
5 Dec 2025
Seminar, Lecture, Talk
IAS / School of Science Joint Lecture - Human B Cell Receptor-Epitope Selection for Pan-Sarbecovirus Neutralization
Abstract The induction of broadly neutralizing antibodies (bnAbs) against viruses requires the specific activation of human B cell receptors (BCRs) by viral epitopes. Following BCR activation, ...