Assessing the potential of LLM-assisted annotation for corpus-based pragmatics and discourse analysis: The case of apologies

Danni Yu*, Luyang Li, Hang Su, Matteo Fuoli

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

786 Downloads (Pure)

Abstract

Certain forms of linguistic annotation, like part of speech and semantic tagging, can be automated with high accuracy. However, manual annotation is still necessary for complex pragmatic and discursive features that lack a direct mapping to lexical forms. This manual process is time-consuming and error-prone, limiting the scalability of function-to-form approaches in corpus linguistics. To address this, our study explores the possibility of using large language models (LLMs) to automate pragma-discursive corpus annotation. We compare GPT-3.5 (the model behind the free-to-use version of ChatGPT), GPT-4 (the model underpinning the precise mode of Bing chatbot), and a human coder in annotating apology components in English based on the local grammar framework. We find that GPT-4 outperformed GPT-3.5, with accuracy approaching that of a human coder. These results suggest that LLMs can be successfully deployed to aid pragma-discursive corpus annotation, making the process more efficient, scalable and accessible.
Original languageEnglish
JournalInternational Journal of Corpus Linguistics
Early online date3 Jun 2024
DOIs
Publication statusE-pub ahead of print - 3 Jun 2024

Keywords

  • corpus pragmatics
  • large language models
  • pragma-discursive corpus annotation
  • local grammar
  • ChatGPT

Fingerprint

Dive into the research topics of 'Assessing the potential of LLM-assisted annotation for corpus-based pragmatics and discourse analysis: The case of apologies'. Together they form a unique fingerprint.

Cite this