FisherSFT: Data-Efficient Supervised Fine-Tuning of Language Models Using Information Gain

Best AI papers explained - A podcast by Enoch H. Kang

Categories:

This paper introduces FisherSFT, a method for making supervised fine-tuning (SFT) of large language models (LLMs) more data-efficient by selecting the most informative training examples. The key concept is to choose examples that maximize information gain, which is approximated by evaluating the Hessian of the LLM's log-likelihood. This approach uses a computationally efficient approximation based on linearizing the LLM's last layer and employs a greedy algorithm to select sentences with the highest information gain. The authors provide a theoretical analysis bounding the prediction error and empirical results demonstrating FisherSFT's superiority over baseline sampling methods on synthetic and real-world datasets, including GPT-2.