When and Why LLMs Fail to Reason Globally

Best AI papers explained - A podcast by Enoch H. Kang

Categories:

This research explores why Large Language Models (LLMs) struggle with tasks requiring global reasoning over long inputs. The authors propose that these limitations stem from constraints on information flow within LLMs, formalizing this with the Bounded Attention Prefix Oracle (BAPO) model. They classify problems as BAPO-easy or BAPO-hard, predicting that LLMs will fail on the latter. Empirical results with models like GPT-4o, Claude, and Gemini support this prediction, showing poor performance on BAPO-hard tasks even for relatively small inputs. Crucially, the paper demonstrates theoretically and empirically that using Chain of Thought (CoT) reasoning can transform BAPO-hard problems into BAPO-easy ones, significantly improving performance despite potentially high token usage.