Poisson regression for linguists: a tutorial introduction to modelling count data with brms

Bodo Winter, Paul‐christian Bürkner

Research output: Contribution to journalArticlepeer-review

73 Downloads (Pure)

Abstract

Count data is prevalent in many different areas of linguistics, such as when counting words, syntactic constructions, discourse particles, case markers, or speech errors. The Poisson distribution is the canonical distribution for characterising count data with no or unknown upper bound. Given the prevalence of count data in linguistics, Poisson regression has wide utility no matter what subfield of linguistics is considered. However, in contrast to logistic regression, Poisson regression is surprisingly little known. Here, we make a case for why linguists need to consider Poisson regression, and give recommendations for when Poisson regression is more appropriate compared to logistic regression. This tutorial introduces readers to foundational concepts needed to understand the basics of Poisson regression, followed by a hands-on tutorial using the R package brms. We discuss a dataset where Catalan and Korean speakers change the frequency of their co-speech gestures as a function of politeness contexts. This dataset also involves exposure variables (the incorporation of time to deal with unequal intervals) and overdispersion (excess variance). Altogether, we hope that more linguists will consider Poisson regression for the analysis of count data.
Original languageEnglish
Article numbere12439
Number of pages23
JournalLanguage and Linguistics Compass
Volume15
Issue number11
Early online date16 Nov 2021
DOIs
Publication statusPublished - Nov 2021

Fingerprint

Dive into the research topics of 'Poisson regression for linguists: a tutorial introduction to modelling count data with brms'. Together they form a unique fingerprint.

Cite this