Poisson regression for linguists: a tutorial introduction to modelling count data with brms

Bodo Winter; Paul‐christian Bürkner

doi:10.1111/lnc3.12439

Poisson regression for linguists: a tutorial introduction to modelling count data with brms

Bodo Winter, Paul‐christian Bürkner

English Language and Linguistics

Research output: Contribution to journal › Article › peer-review

353 Downloads (Pure)

Abstract

Count data is prevalent in many different areas of linguistics, such as when counting words, syntactic constructions, discourse particles, case markers, or speech errors. The Poisson distribution is the canonical distribution for characterising count data with no or unknown upper bound. Given the prevalence of count data in linguistics, Poisson regression has wide utility no matter what subfield of linguistics is considered. However, in contrast to logistic regression, Poisson regression is surprisingly little known. Here, we make a case for why linguists need to consider Poisson regression, and give recommendations for when Poisson regression is more appropriate compared to logistic regression. This tutorial introduces readers to foundational concepts needed to understand the basics of Poisson regression, followed by a hands-on tutorial using the R package brms. We discuss a dataset where Catalan and Korean speakers change the frequency of their co-speech gestures as a function of politeness contexts. This dataset also involves exposure variables (the incorporation of time to deal with unequal intervals) and overdispersion (excess variance). Altogether, we hope that more linguists will consider Poisson regression for the analysis of count data.

Original language	English
Article number	e12439
Number of pages	23
Journal	Language and Linguistics Compass
Volume	15
Issue number	11
Early online date	16 Nov 2021
DOIs	https://doi.org/10.1111/lnc3.12439
Publication status	Published - Nov 2021

Access to Document

10.1111/lnc3.12439Licence: Creative Commons: Attribution (CC BY)

WinterB2021PoissonFinal published version, 2.14 MBLicence: Creative Commons: Attribution (CC BY)

https://onlinelibrary.wiley.com/doi/10.1111/lnc3.12439Licence: Creative Commons: Attribution (CC BY)

Cite this

@article{79531e266c5b483fb68e9d9794e33072,

title = "Poisson regression for linguists: a tutorial introduction to modelling count data with brms",

abstract = "Count data is prevalent in many different areas of linguistics, such as when counting words, syntactic constructions, discourse particles, case markers, or speech errors. The Poisson distribution is the canonical distribution for characterising count data with no or unknown upper bound. Given the prevalence of count data in linguistics, Poisson regression has wide utility no matter what subfield of linguistics is considered. However, in contrast to logistic regression, Poisson regression is surprisingly little known. Here, we make a case for why linguists need to consider Poisson regression, and give recommendations for when Poisson regression is more appropriate compared to logistic regression. This tutorial introduces readers to foundational concepts needed to understand the basics of Poisson regression, followed by a hands-on tutorial using the R package brms. We discuss a dataset where Catalan and Korean speakers change the frequency of their co-speech gestures as a function of politeness contexts. This dataset also involves exposure variables (the incorporation of time to deal with unequal intervals) and overdispersion (excess variance). Altogether, we hope that more linguists will consider Poisson regression for the analysis of count data.",

author = "Bodo Winter and Paul‐christian B{\"u}rkner",

year = "2021",

month = nov,

doi = "10.1111/lnc3.12439",

language = "English",

volume = "15",

journal = "Language and Linguistics Compass",

issn = "1749-818X",

publisher = "Wiley",

number = "11",

}

TY - JOUR

T1 - Poisson regression for linguists

T2 - a tutorial introduction to modelling count data with brms

AU - Winter, Bodo

AU - Bürkner, Paul‐christian

PY - 2021/11

Y1 - 2021/11

N2 - Count data is prevalent in many different areas of linguistics, such as when counting words, syntactic constructions, discourse particles, case markers, or speech errors. The Poisson distribution is the canonical distribution for characterising count data with no or unknown upper bound. Given the prevalence of count data in linguistics, Poisson regression has wide utility no matter what subfield of linguistics is considered. However, in contrast to logistic regression, Poisson regression is surprisingly little known. Here, we make a case for why linguists need to consider Poisson regression, and give recommendations for when Poisson regression is more appropriate compared to logistic regression. This tutorial introduces readers to foundational concepts needed to understand the basics of Poisson regression, followed by a hands-on tutorial using the R package brms. We discuss a dataset where Catalan and Korean speakers change the frequency of their co-speech gestures as a function of politeness contexts. This dataset also involves exposure variables (the incorporation of time to deal with unequal intervals) and overdispersion (excess variance). Altogether, we hope that more linguists will consider Poisson regression for the analysis of count data.

AB - Count data is prevalent in many different areas of linguistics, such as when counting words, syntactic constructions, discourse particles, case markers, or speech errors. The Poisson distribution is the canonical distribution for characterising count data with no or unknown upper bound. Given the prevalence of count data in linguistics, Poisson regression has wide utility no matter what subfield of linguistics is considered. However, in contrast to logistic regression, Poisson regression is surprisingly little known. Here, we make a case for why linguists need to consider Poisson regression, and give recommendations for when Poisson regression is more appropriate compared to logistic regression. This tutorial introduces readers to foundational concepts needed to understand the basics of Poisson regression, followed by a hands-on tutorial using the R package brms. We discuss a dataset where Catalan and Korean speakers change the frequency of their co-speech gestures as a function of politeness contexts. This dataset also involves exposure variables (the incorporation of time to deal with unequal intervals) and overdispersion (excess variance). Altogether, we hope that more linguists will consider Poisson regression for the analysis of count data.

U2 - 10.1111/lnc3.12439

DO - 10.1111/lnc3.12439

M3 - Article

SN - 1749-818X

VL - 15

JO - Language and Linguistics Compass

JF - Language and Linguistics Compass

IS - 11

M1 - e12439

ER -

Poisson regression for linguists: a tutorial introduction to modelling count data with brms

Abstract

Access to Document

Fingerprint

Cite this