Machine meets man: evaluating the psychological reality of corpus-based probabilistic models

Dagmar Divjak; Ewa Dabrowska; Antti Arppe

doi:10.1515/cog-2015-0101

Machine meets man: evaluating the psychological reality of corpus-based probabilistic models

Dagmar Divjak, Ewa Dabrowska, Antti Arppe

Research output: Contribution to journal › Article › peer-review

14 Citations (Scopus)

205 Downloads (Pure)

Abstract

Linguistic convention typically allows speakers several options. Evidence is accumulating that the various options are preferred in different contexts, yet the criteria governing the selection of the appropriate form are often far from obvious. Most researchers who attempt to discover the factors determining a preference rely on the linguistic analysis and statistical modeling of data extracted from large corpora. In this paper, we address the question of how to evaluate such models and explicitly compare the performance of a statistical model derived from a corpus with that of native speakers in selecting one of six Russian TRY verbs. Building on earlier work we trained a polytomous logistic regression model to predict verb choice given the sentential context. We compare the predictions the model makes for 60 unseen sentences to the choices adult native speakers make in those same sentences. We then look in more detail at the interplay of the contextual properties and model computationally how individual differences in assessing the importance of contextual properties may impact the linguistic knowledge of native speakers. Finally, we compare the probability the model assigns to encountering each of the six verbs in the 60 test sentences to the acceptability ratings the adult native speakers give to those sentences. We discuss the implications of our findings for both usage-based theory and empirical linguistic methodology.

Original language	English
Pages (from-to)	1-33
Number of pages	33
Journal	Cognitive Linguistics
Volume	27
Issue number	1
Early online date	6 Jan 2016
DOIs	https://doi.org/10.1515/cog-2015-0101
Publication status	Published - 1 Feb 2016

Keywords

statistical models
psychological reality
forced-choice task
acceptability ratings
synonymy

Access to Document

10.1515/cog-2015-0101Licence: None: All rights reserved

Divjak_Machine_meets_man_cognitive_linguistics_2016Final published version, 6.2 MBLicence: None: All rights reserved

Cite this

@article{40e26752d3584789a20ce84f49cb27c4,

title = "Machine meets man: evaluating the psychological reality of corpus-based probabilistic models",

abstract = "Linguistic convention typically allows speakers several options. Evidence is accumulating that the various options are preferred in different contexts, yet the criteria governing the selection of the appropriate form are often far from obvious. Most researchers who attempt to discover the factors determining a preference rely on the linguistic analysis and statistical modeling of data extracted from large corpora. In this paper, we address the question of how to evaluate such models and explicitly compare the performance of a statistical model derived from a corpus with that of native speakers in selecting one of six Russian TRY verbs. Building on earlier work we trained a polytomous logistic regression model to predict verb choice given the sentential context. We compare the predictions the model makes for 60 unseen sentences to the choices adult native speakers make in those same sentences. We then look in more detail at the interplay of the contextual properties and model computationally how individual differences in assessing the importance of contextual properties may impact the linguistic knowledge of native speakers. Finally, we compare the probability the model assigns to encountering each of the six verbs in the 60 test sentences to the acceptability ratings the adult native speakers give to those sentences. We discuss the implications of our findings for both usage-based theory and empirical linguistic methodology.",

keywords = "statistical models, psychological reality, forced-choice task, acceptability ratings, synonymy",

author = "Dagmar Divjak and Ewa Dabrowska and Antti Arppe",

year = "2016",

month = feb,

day = "1",

doi = "10.1515/cog-2015-0101",

language = "English",

volume = "27",

pages = "1--33",

journal = "Cognitive Linguistics",

issn = "0936-5907",

publisher = "De Gruyter",

number = "1",

}

TY - JOUR

T1 - Machine meets man

T2 - evaluating the psychological reality of corpus-based probabilistic models

AU - Divjak, Dagmar

AU - Dabrowska, Ewa

AU - Arppe, Antti

PY - 2016/2/1

Y1 - 2016/2/1

N2 - Linguistic convention typically allows speakers several options. Evidence is accumulating that the various options are preferred in different contexts, yet the criteria governing the selection of the appropriate form are often far from obvious. Most researchers who attempt to discover the factors determining a preference rely on the linguistic analysis and statistical modeling of data extracted from large corpora. In this paper, we address the question of how to evaluate such models and explicitly compare the performance of a statistical model derived from a corpus with that of native speakers in selecting one of six Russian TRY verbs. Building on earlier work we trained a polytomous logistic regression model to predict verb choice given the sentential context. We compare the predictions the model makes for 60 unseen sentences to the choices adult native speakers make in those same sentences. We then look in more detail at the interplay of the contextual properties and model computationally how individual differences in assessing the importance of contextual properties may impact the linguistic knowledge of native speakers. Finally, we compare the probability the model assigns to encountering each of the six verbs in the 60 test sentences to the acceptability ratings the adult native speakers give to those sentences. We discuss the implications of our findings for both usage-based theory and empirical linguistic methodology.

AB - Linguistic convention typically allows speakers several options. Evidence is accumulating that the various options are preferred in different contexts, yet the criteria governing the selection of the appropriate form are often far from obvious. Most researchers who attempt to discover the factors determining a preference rely on the linguistic analysis and statistical modeling of data extracted from large corpora. In this paper, we address the question of how to evaluate such models and explicitly compare the performance of a statistical model derived from a corpus with that of native speakers in selecting one of six Russian TRY verbs. Building on earlier work we trained a polytomous logistic regression model to predict verb choice given the sentential context. We compare the predictions the model makes for 60 unseen sentences to the choices adult native speakers make in those same sentences. We then look in more detail at the interplay of the contextual properties and model computationally how individual differences in assessing the importance of contextual properties may impact the linguistic knowledge of native speakers. Finally, we compare the probability the model assigns to encountering each of the six verbs in the 60 test sentences to the acceptability ratings the adult native speakers give to those sentences. We discuss the implications of our findings for both usage-based theory and empirical linguistic methodology.

KW - statistical models

KW - psychological reality

KW - forced-choice task

KW - acceptability ratings

KW - synonymy

UR - https://www.academia.edu/20371651/Man_Meets_Machine._Evaluating_the_psychological_reality_of_corpus-based_probabilistic_models

UR - http://eprints.whiterose.ac.uk/90779/

U2 - 10.1515/cog-2015-0101

DO - 10.1515/cog-2015-0101

M3 - Article

SN - 0936-5907

VL - 27

SP - 1

EP - 33

JO - Cognitive Linguistics

JF - Cognitive Linguistics

IS - 1

ER -

Machine meets man: evaluating the psychological reality of corpus-based probabilistic models

Abstract

Keywords

Access to Document

Fingerprint

Cite this