Large Language Models Are (Bayesian) Latent Variable Models: Explaining and Finding Good Demonstrations for In-Context Learning

Best AI papers explained - A podcast by Enoch H. Kang

Categories:

This academic paper proposes a novel approach to understanding how large language models (LLMs) learn from example demonstrations provided within the input, a process called in-context learning. The authors suggest viewing LLMs through a Bayesian perspective, considering them as implicitly inferring a latent variable that encapsulates task information. Based on this theory, they developed an algorithm to select the most effective demonstrations by training a smaller LLM to identify examples most likely to reveal this latent concept. Remarkably, the selected demonstrations can be generalized to larger LLMs, significantly boosting performance on various text classification and math problems compared to baseline methods, providing empirical support for their hypothesis.