From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models

Best AI papers explained - A podcast by Enoch H. Kang

Categories:

This academic paper explores various methods for improving the text generated by large language models (LLMs) after they have been trained, focusing on inference-time algorithms. It categorizes these techniques into three core areas: token-level generation algorithms that operate on individual tokens, meta-generation algorithms which structure multiple generation steps, and strategies for efficient generation concerning both token cost and speed. The work formalizes the objectives of different generation approaches and discusses how to incorporate external information, such as other models or tools, to enhance output quality. The authors also analyze the cost-performance tradeoffs of these algorithms and highlight future research directions.