LLM In-Context Learning as Kernel Regression

Best AI papers explained - A podcast by Enoch H. Kang

Categories:

This academic paper investigates the mechanism behind in-context learning (ICL) in large language models (LLMs). The authors propose a theoretical analysis suggesting that ICL can be understood as kernel regression, where the model uses input-output examples within the prompt to make predictions on new data. Through analysis of attention patterns and experiments across different tasks, the study provides evidence that LLMs allocate significant attention to the demonstration samples, particularly their labels, and that internal key and value vectors store relevant information for this process. While the kernel regression framework explains phenomena like the importance of example similarity and output formats, the paper acknowledges that certain aspects of ICL, such as sensitivity to sample order, remain unexplained by this model.