Shuang Wu
Shuang Wu
About
Projects
Publications
Light
Dark
Automatic
nonparametric statistics
Exploration in Bandit Algorithms
We propose a new bootstrap-based online algorithm for stochastic linear bandit problems. The key idea is to adopt residual bootstrap exploration, in which the agent estimates the next step reward by re-sampling the residuals of mean reward estimate.
Residual Bootstrap Exploration for Stochastic Linear Bandit
Shuang Wu
,
Chi-Hua Wang
,
Yuantong Li
,
Guang Cheng
Cite
×