TY - JOUR
T1 - Machine meets man
T2 - evaluating the psychological reality of corpus-based probabilistic models
AU - Divjak, Dagmar
AU - Dabrowska, Ewa
AU - Arppe, Antti
PY - 2016/2/1
Y1 - 2016/2/1
N2 - Linguistic convention typically allows speakers several options. Evidence is accumulating that the various options are preferred in different contexts, yet the criteria governing the selection of the appropriate form are often far from obvious. Most researchers who attempt to discover the factors determining a preference rely on the linguistic analysis and statistical modeling of data extracted from large corpora. In this paper, we address the question of how to evaluate such models and explicitly compare the performance of a statistical model derived from a corpus with that of native speakers in selecting one of six Russian TRY verbs. Building on earlier work we trained a polytomous logistic regression model to predict verb choice given the sentential context. We compare the predictions the model makes for 60 unseen sentences to the choices adult native speakers make in those same sentences. We then look in more detail at the interplay of the contextual properties and model computationally how individual differences in assessing the importance of contextual properties may impact the linguistic knowledge of native speakers. Finally, we compare the probability the model assigns to encountering each of the six verbs in the 60 test sentences to the acceptability ratings the adult native speakers give to those sentences. We discuss the implications of our findings for both usage-based theory and empirical linguistic methodology.
AB - Linguistic convention typically allows speakers several options. Evidence is accumulating that the various options are preferred in different contexts, yet the criteria governing the selection of the appropriate form are often far from obvious. Most researchers who attempt to discover the factors determining a preference rely on the linguistic analysis and statistical modeling of data extracted from large corpora. In this paper, we address the question of how to evaluate such models and explicitly compare the performance of a statistical model derived from a corpus with that of native speakers in selecting one of six Russian TRY verbs. Building on earlier work we trained a polytomous logistic regression model to predict verb choice given the sentential context. We compare the predictions the model makes for 60 unseen sentences to the choices adult native speakers make in those same sentences. We then look in more detail at the interplay of the contextual properties and model computationally how individual differences in assessing the importance of contextual properties may impact the linguistic knowledge of native speakers. Finally, we compare the probability the model assigns to encountering each of the six verbs in the 60 test sentences to the acceptability ratings the adult native speakers give to those sentences. We discuss the implications of our findings for both usage-based theory and empirical linguistic methodology.
KW - statistical models
KW - psychological reality
KW - forced-choice task
KW - acceptability ratings
KW - synonymy
UR - https://www.academia.edu/20371651/Man_Meets_Machine._Evaluating_the_psychological_reality_of_corpus-based_probabilistic_models
UR - http://eprints.whiterose.ac.uk/90779/
U2 - 10.1515/cog-2015-0101
DO - 10.1515/cog-2015-0101
M3 - Article
SN - 0936-5907
VL - 27
SP - 1
EP - 33
JO - Cognitive Linguistics
JF - Cognitive Linguistics
IS - 1
ER -