Your Pre-trained LLM is Secretly an Unsupervised Confidence Calibrator
Best AI papers explained - A podcast by Enoch H. Kang

Categories:
This paper introduces Disagreement-Aware Confidence Alignment (DACA), an unsupervised method for calibrating the confidence of post-trained large language models (PoLMs). While pre-trained language models (PLMs) are typically well-calibrated, post-training can lead to over-confidence, especially with limited labeled data. DACA addresses this by leveraging the well-calibrated confidence of PLMs on unlabeled data, specifically by optimizing calibration parameters only on examples where PLM and PoLM predictions agree. This process avoids the negative impact of prediction disagreement on calibration, resulting in more accurate confidence scores for PoLMs, which is shown to improve performance on various benchmarks and model sizes, including for open-ended question answering and selective classification.