Do Not Design, Learn: A Trainable Scoring Function for Uncertainty Estimation in Generative LLMs

  • Duygu Nur Yaldiz
  • , Yavuz Faruk Bakman
  • , Baturalp Buyukates
  • , Chenyang Tao
  • , Anil Ramakrishna
  • , Dimitrios Dimitriadis
  • , Jieyu Zhao
  • , Salman Avestimehr

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Uncertainty estimation (UE) of generative large language models (LLMs) is crucial for evaluating the reliability of generated sequences. A significant subset of UE methods utilize token probabilities to assess uncertainty, aggregating multiple token probabilities into a single UE score using a scoring function. Existing scoring functions for probability-based UE, such as length-normalized scoring and semantic contribution-based weighting, are designed to solve certain aspects of the problem but exhibit limitations, including the inability to handle biased probabilities and complex semantic dependencies between tokens. To address these issues, in this work, we propose Learnable Response Scoring (LARS) function, a novel scoring function that leverages supervised data to capture complex dependencies between tokens and probabilities, thereby producing more reliable and calibrated response scores in computing the uncertainty of LLM generations. Our comprehensive experiments across question-answering and arithmetical reasoning tasks with various datasets demonstrate that LARS significantly outperforms existing scoring functions, achieving improvements of up to 16% AUROC score.1

Original languageEnglish
Title of host publicationFindings of the Association for Computational Linguistics
Subtitle of host publicationNAACL 2025
EditorsLuis Chiruzzo, Alan Ritter, Lu Wang
PublisherAssociation for Computational Linguistics, ACL
Pages691-713
Number of pages23
ISBN (Electronic)9798891761957
DOIs
Publication statusPublished - Apr 2025
Event2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics, NAACL 2025 - Albuquerque, United States
Duration: 29 Apr 20254 May 2025

Conference

Conference2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics, NAACL 2025
Country/TerritoryUnited States
CityAlbuquerque
Period29/04/254/05/25

Bibliographical note

Publisher Copyright:
©2025 Association for Computational Linguistics.

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Hardware and Architecture
  • Information Systems
  • Software

Fingerprint

Dive into the research topics of 'Do Not Design, Learn: A Trainable Scoring Function for Uncertainty Estimation in Generative LLMs'. Together they form a unique fingerprint.

Cite this