The informativeness of linguistic unit boundaries

Jeroen Geertzen, James P. Blevins, Petar Milin

Research output: Contribution to journalArticlepeer-review

5 Citations (Scopus)
48 Downloads (Pure)

Abstract

Contemporary models of structural analysis tend to operate with dis- crete units at different linguistic levels. There is, however, considerable debate regarding the choice of units and the validity of the cues that guide their demarcation. At the level of grammatical analysis, this debate focuses largely on the status of words vs sub-word units and on the generality of the linguistic properties that mark each type of unit. This paper suggests that the status of a unit type can be evaluated in terms of its informativity. A measure of informativity is obtained by assessing the influence that different unit boundary types have on text compressibility. The results obtained from this initial study support a pair of general conclusions. The first is that unit boundaries primarily reflect a statistical structure, and that the typological variability of linguistic cues reflects the fact that they serve a secondary rein- forcing function. The second is that word boundaries are the most informative boundary type, and that the demarcation of words provides the most informa- tive description of the regular patterns in a language.
Original languageEnglish
Pages (from-to)25-48
Number of pages23
JournalItalian Journal of Linguistics
Volume28
Issue number1
Publication statusPublished - 8 Jan 2016

Keywords

  • linguistic units
  • words
  • abstractive perspective
  • information theory
  • Shannon entropy
  • Kolmogorov complexity

Fingerprint

Dive into the research topics of 'The informativeness of linguistic unit boundaries'. Together they form a unique fingerprint.

Cite this