Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search

Best AI papers explained - A podcast by Enoch H. Kang

Categories:

This academic paper presents Bayes-adaptive Monte-Carlo Planning (BAMCP), a novel algorithm designed to tackle the computational challenges of Bayesian model-based reinforcement learning. The core idea is to use Monte-Carlo tree search within a modified framework that avoids the computationally expensive posterior belief updates at every step within the search tree. Instead, BAMCP employs root sampling, where a single model is sampled from the posterior distribution at the start of each simulation, and leverages a lazy sampling scheme to efficiently sample only the necessary model parameters. The authors demonstrate through experiments on various benchmark problems, including a challenging infinite state space domain, that BAMCP outperforms existing Bayesian reinforcement learning algorithms while maintaining asymptotic convergence to the Bayes-optimal policy.