Machine meets man: evaluating the psychological reality of corpus-based probabilistic models

Research output: Contribution to journalArticlepeer-review

14 Citations (Scopus)
205 Downloads (Pure)

Abstract

Linguistic convention typically allows speakers several options. Evidence is accumulating that the various options are preferred in different contexts, yet the criteria governing the selection of the appropriate form are often far from obvious. Most researchers who attempt to discover the factors determining a preference rely on the linguistic analysis and statistical modeling of data extracted from large corpora. In this paper, we address the question of how to evaluate such models and explicitly compare the performance of a statistical model derived from a corpus with that of native speakers in selecting one of six Russian TRY verbs. Building on earlier work we trained a polytomous logistic regression model to predict verb choice given the sentential context. We compare the predictions the model makes for 60 unseen sentences to the choices adult native speakers make in those same sentences. We then look in more detail at the interplay of the contextual properties and model computationally how individual differences in assessing the importance of contextual properties may impact the linguistic knowledge of native speakers. Finally, we compare the probability the model assigns to encountering each of the six verbs in the 60 test sentences to the acceptability ratings the adult native speakers give to those sentences. We discuss the implications of our findings for both usage-based theory and empirical linguistic methodology.
Original languageEnglish
Pages (from-to)1-33
Number of pages33
JournalCognitive Linguistics
Volume27
Issue number1
Early online date6 Jan 2016
DOIs
Publication statusPublished - 1 Feb 2016

Keywords

  • statistical models
  • psychological reality
  • forced-choice task
  • acceptability ratings
  • synonymy

Fingerprint

Dive into the research topics of 'Machine meets man: evaluating the psychological reality of corpus-based probabilistic models'. Together they form a unique fingerprint.

Cite this