Optimal methods for reinforcement learning: Efficient algorithms with instance-dependent guarantees | Wenlong Mou
Reinforcement learning (RL) is a pillar for modern artificial intelligence. Compared to classical statistical learning, several new statistical and computational phenomena arise from RL problems, leading to different trade-offs in the choice of the estimators, tuning of their parameters, and the design of efficient algorithms. In many settings, asymptotic and/or worst-case theory fails to provide the relevant guidance.
In this talk, I present recent advances that involve a more refined approach to RL, one that leads to non-asymptotic and instance-optimal guarantees. The bulk of this talk focuses on function approximation methods for policy evaluation. I establish a novel class of optimal and instance-dependent oracle inequalities for projected Bellman equations, as well as efficient computational algorithms achieving them. Among other results, I will highlight how the instance-optimal guarantees guide the selection of tuning parameters in temporal different methods, and tackle the instability issue with general function classes. Drawing on this perspective, I will also discuss a novel class of stochastic approximation methods that yield optimal statistical guarantees for policy optimization problems.