Graph Neural Thompson Sampling

June 2024

Abstract

We consider an online decision-making problem with a reward function defined over graph-structured data. We formally formulate the problem as an instance of graph action bandit. We then propose GNN-TS, a Graph Neural Network (GNN) powered Thompson Sampling (TS) algorithm which employs a GNN approximator for estimating the mean reward function and the graph neural tangent features for uncertainty estimation. We prove that, under certain boundness assumptions on the reward function, GNN-TS achieves a state-of-the-art regret bound which is (1) sub-linear in the number of interaction rounds and (2) independent of the number of graph nodes. Empirical results validate that our proposed GNN-TS exhibits competitive performance and scales well on graph action bandit problems.

Type

Conference paper

Publication

RLC2024

Graph Neural Thompson Sampling

Abstract

Shuang Wu

Ph.D. in Statistics

Related