Inference time alignment in continuous space

Best AI papers explained - A podcast by Enoch H. Kang

Categories:

This academic paper introduces Simple Energy Adaptation (SEA), a novel method for aligning large language models (LLMs) with human preferences during the inference phase. Unlike traditional methods that rely on discrete searches within a limited set of responses from the base model, SEA formulates alignment as an iterative optimization process in a continuous latent space. By applying gradient-based Langevin Dynamics to the continuous output logits, guided by an energy function derived from the optimal RLHF policy, SEA more effectively explores potential responses. Experimental results on various tasks like safety, truthfulness, and reasoning demonstrate that SEA significantly outperforms existing search-based techniques, even those using larger candidate sets, highlighting the advantages of continuous optimization for inference-time LLM alignment. The paper also analyzes how SEA mitigates the issue of "shallow alignment," promoting a balanced distribution of alignment efforts across the entire output.