Assessing the potential of LLM-assisted annotation for corpus-based pragmatics and discourse analysis: The case of apologies

Danni Yu; Luyang Li; Hang  Su; Matteo Fuoli

Assessing the potential of LLM-assisted annotation for corpus-based pragmatics and discourse analysis: The case of apologies

Danni Yu^*, Luyang Li, Hang Su, Matteo Fuoli

^*Corresponding author for this work

English Language and Linguistics

Research output: Contribution to journal › Article › peer-review

287 Downloads (Pure)

Abstract

Certain forms of linguistic annotation, like part of speech and semantic tagging, can be automated with high accuracy. However, manual annotation is still necessary for complex pragmatic and discursive features that lack a direct mapping to lexical forms. This manual process is time-consuming and error-prone, limiting the scalability of function-to-form approaches in corpus linguistics. To address this, our study explores the possibility of using large language models (LLMs) to automate pragma-discursive corpus annotation. We compare GPT-3.5 (the model behind the free-to-use version of ChatGPT), GPT-4 (the model underpinning the precise mode of Bing chatbot), and a human coder in annotating apology components in English based on the local grammar framework. We find that GPT-4 outperformed GPT-3.5, with accuracy approaching that of a human coder. These results suggest that LLMs can be successfully deployed to aid pragma-discursive corpus annotation, making the process more efficient, scalable and accessible.

Original language	English
Journal	International Journal of Corpus Linguistics
Publication status	Accepted/In press - 12 Dec 2023

Bibliographical note

Not yet published as of 23/04/2024.

Keywords

corpus pragmatics
large language models
pragma-discursive corpus annotation
local grammar
ChatGPT

Access to Document

YuD2024Assessing
This is the final accepted version of an article that is due to appear in the International Journal of Corpus Linguistics in 2024. Final published version will be available at https://www.benjamins.com/catalog/ijcl. This material is subject to copyright; please contact the publisher for re-use.
Accepted author manuscript, 1.15 MBLicence: Other (please specify with Rights Statement)

Cite this

@article{d896a5f1962d40deaeb6ce6d53093c8e,

title = "Assessing the potential of LLM-assisted annotation for corpus-based pragmatics and discourse analysis: The case of apologies",

abstract = "Certain forms of linguistic annotation, like part of speech and semantic tagging, can be automated with high accuracy. However, manual annotation is still necessary for complex pragmatic and discursive features that lack a direct mapping to lexical forms. This manual process is time-consuming and error-prone, limiting the scalability of function-to-form approaches in corpus linguistics. To address this, our study explores the possibility of using large language models (LLMs) to automate pragma-discursive corpus annotation. We compare GPT-3.5 (the model behind the free-to-use version of ChatGPT), GPT-4 (the model underpinning the precise mode of Bing chatbot), and a human coder in annotating apology components in English based on the local grammar framework. We find that GPT-4 outperformed GPT-3.5, with accuracy approaching that of a human coder. These results suggest that LLMs can be successfully deployed to aid pragma-discursive corpus annotation, making the process more efficient, scalable and accessible. ",

keywords = "corpus pragmatics, large language models, pragma-discursive corpus annotation, local grammar, ChatGPT",

author = "Danni Yu and Luyang Li and Hang Su and Matteo Fuoli",

note = "Not yet published as of 23/04/2024. ",

year = "2023",

month = dec,

day = "12",

language = "English",

journal = "International Journal of Corpus Linguistics",

issn = "1384-6655",

publisher = "John Benjamins Publishing",

}

TY - JOUR

T1 - Assessing the potential of LLM-assisted annotation for corpus-based pragmatics and discourse analysis

T2 - The case of apologies

AU - Yu, Danni

AU - Li, Luyang

AU - Su, Hang

AU - Fuoli, Matteo

N1 - Not yet published as of 23/04/2024.

PY - 2023/12/12

Y1 - 2023/12/12

N2 - Certain forms of linguistic annotation, like part of speech and semantic tagging, can be automated with high accuracy. However, manual annotation is still necessary for complex pragmatic and discursive features that lack a direct mapping to lexical forms. This manual process is time-consuming and error-prone, limiting the scalability of function-to-form approaches in corpus linguistics. To address this, our study explores the possibility of using large language models (LLMs) to automate pragma-discursive corpus annotation. We compare GPT-3.5 (the model behind the free-to-use version of ChatGPT), GPT-4 (the model underpinning the precise mode of Bing chatbot), and a human coder in annotating apology components in English based on the local grammar framework. We find that GPT-4 outperformed GPT-3.5, with accuracy approaching that of a human coder. These results suggest that LLMs can be successfully deployed to aid pragma-discursive corpus annotation, making the process more efficient, scalable and accessible.

AB - Certain forms of linguistic annotation, like part of speech and semantic tagging, can be automated with high accuracy. However, manual annotation is still necessary for complex pragmatic and discursive features that lack a direct mapping to lexical forms. This manual process is time-consuming and error-prone, limiting the scalability of function-to-form approaches in corpus linguistics. To address this, our study explores the possibility of using large language models (LLMs) to automate pragma-discursive corpus annotation. We compare GPT-3.5 (the model behind the free-to-use version of ChatGPT), GPT-4 (the model underpinning the precise mode of Bing chatbot), and a human coder in annotating apology components in English based on the local grammar framework. We find that GPT-4 outperformed GPT-3.5, with accuracy approaching that of a human coder. These results suggest that LLMs can be successfully deployed to aid pragma-discursive corpus annotation, making the process more efficient, scalable and accessible.

KW - corpus pragmatics

KW - large language models

KW - pragma-discursive corpus annotation

KW - local grammar

KW - ChatGPT

UR - https://www.benjamins.com/catalog/ijcl

M3 - Article

SN - 1384-6655

JO - International Journal of Corpus Linguistics

JF - International Journal of Corpus Linguistics

ER -

Assessing the potential of LLM-assisted annotation for corpus-based pragmatics and discourse analysis: The case of apologies

Abstract

Bibliographical note

Keywords

Access to Document

Fingerprint

Cite this