Calibrating the Confidence of Large Language Models by Eliciting Fidelity
2024
Online
report
Zugriff:
Large language models optimized with techniques like RLHF have achieved good alignment in being helpful and harmless. However, post-alignment, these language models often exhibit overconfidence, where the expressed confidence does not accurately calibrate with their correctness rate. In this paper, we decompose the language model confidence into the \textit{Uncertainty} about the question and the \textit{Fidelity} to the answer generated by language models. Then, we propose a plug-and-play method to estimate the confidence of language models. Our method has shown good calibration performance by conducting experiments with 6 RLHF-LMs on four MCQA datasets. Moreover, we propose two novel metrics, IPR and CE, to evaluate the calibration of the model, and we have conducted a detailed discussion on \textit{Truly Well-Calibrated Confidence}. Our method could serve as a strong baseline, and we hope that this work will provide some insights into the model confidence calibration.
Comment: 17 pages, 13 figures
Titel: |
Calibrating the Confidence of Large Language Models by Eliciting Fidelity
|
---|---|
Autor/in / Beteiligte Person: | Zhang, Mozhi ; Huang, Mianqiu ; Shi, Rundong ; Guo, Linsen ; Peng, Chong ; Yan, Peng ; Zhou, Yaqian ; Qiu, Xipeng |
Link: | |
Veröffentlichung: | 2024 |
Medientyp: | report |
Schlagwort: |
|
Sonstiges: |
|