Causal Interpretation of Transformer Self-Attention
Best AI papers explained - A podcast by Enoch H. Kang

Categories:
This research proposes a novel approach to understanding the self-attention mechanism within Transformer neural networks, interpreting it through the lens of structural causal models (SCMs). By viewing self-attention as a method for estimating an SCM for input sequences, the authors demonstrate that pre-trained Transformers can be used for zero-shot causal discovery, even in the presence of unobserved factors. This allows for learning the causal structure over individual input sequences by analyzing the attention matrix, which can then be used to provide causal explanations for the Transformer's outputs in tasks like sentiment classification and recommendation systems. The proposed method, called CLEANN, is shown to produce smaller and more specific explanation sets compared to baseline approaches.