RL with KL penalties is better viewed as Bayesian inference

Best AI papers explained - A podcast by Enoch H. Kang

Categories:

This paper proposes a novel Bayesian inference perspective for understanding and improving fine-tuning methods for large language models (LMs). The authors argue that traditional Reinforcement Learning (RL) approaches, when applied naively, lead to distribution collapse, where the LM generates only a limited set of high-reward outputs. They demonstrate that the commonly used KL-regularized RL objective, which adds a penalty for deviating from the original LM distribution, is equivalent to variational inference, a method for approximating a Bayesian posterior. This viewpoint suggests that LM alignment with human preferences can be framed as a Bayesian inference problem, offering a more robust theoretical foundation and potentially avoiding the pitfalls of standard RL. The paper also highlights the separation of the modeling problem (defining the desired LM behavior) and the inference problem (approximating that behavior), suggesting that RL is not the most suitable formal framework for LM fine-tuning.