The Evolution of Statistical Induction Heads: In-Context Learning Markov Chains

Best AI papers explained - A podcast by Enoch H. Kang

Categories:

This document studies how transformers learn to predict sequential patterns in context, focusing on Markov Chains, a fundamental type of sequence. The research introduces a task called ICL-MC to investigate this, where models must learn from input sequences generated by different Markov Chains. The findings indicate that transformers develop "statistical induction heads" capable of calculating next-token probabilities based on the sequence's history, achieving near-optimal performance. Notably, training exhibits distinct phases, progressing from simple uniform predictions to complex bigram-based ones, with evidence suggesting that a bias towards simpler solutions might temporarily hinder the learning of more complex patterns. The interaction and alignment of transformer layers are shown to be crucial for this multi-phase learning process.