How Transformers Learn Causal Structure with Gradient Descent
Best AI papers explained - A podcast by Enoch H. Kang

Categories:
This research investigates how transformers learn causal structure through gradient descent, focusing on their ability to perform in-context learning. The authors introduce a novel task involving random sequences with latent causal relationships and analyze a simplified two-layer transformer architecture. They demonstrate theoretically that gradient descent on the first attention layer recovers this hidden causal graph by computing a measure of mutual information between tokens. This learned causal structure then facilitates in-context estimation of transition probabilities, and the model is proven to generalize well even to out-of-distribution data. Experiments on various causal graphs support the theoretical findings.