Graph Neural Thompson Sampling

Abstract

We consider an online decision-making problem with a reward function defined over graph-structured data. We formally formulate the problem as an instance of graph action bandit. We then propose GNN-TS, a Graph Neural Network (GNN) powered Thompson Sampling (TS) algorithm which employs a GNN approximator for estimating the mean reward function and the graph neural tangent features for uncertainty estimation. We prove that, under certain boundness assumptions on the reward function, GNN-TS achieves a state-of-the-art regret bound which is (1) sub-linear in the number of interaction rounds and (2) independent of the number of graph nodes. Empirical results validate that our proposed GNN-TS exhibits competitive performance and scales well on graph action bandit problems.

Publication
RLC2024
Shuang Wu
Shuang Wu
Ph.D. in Statistics

My research interests include ML on Graphs and Bandit Algorithms.

Related