Abstract
Contemporary models of structural analysis tend to operate with dis- crete units at different linguistic levels. There is, however, considerable debate regarding the choice of units and the validity of the cues that guide their demarcation. At the level of grammatical analysis, this debate focuses largely on the status of words vs sub-word units and on the generality of the linguistic properties that mark each type of unit. This paper suggests that the status of a unit type can be evaluated in terms of its informativity. A measure of informativity is obtained by assessing the influence that different unit boundary types have on text compressibility. The results obtained from this initial study support a pair of general conclusions. The first is that unit boundaries primarily reflect a statistical structure, and that the typological variability of linguistic cues reflects the fact that they serve a secondary rein- forcing function. The second is that word boundaries are the most informative boundary type, and that the demarcation of words provides the most informa- tive description of the regular patterns in a language.
Original language | English |
---|---|
Pages (from-to) | 25-48 |
Number of pages | 23 |
Journal | Italian Journal of Linguistics |
Volume | 28 |
Issue number | 1 |
Publication status | Published - 8 Jan 2016 |
Keywords
- linguistic units
- words
- abstractive perspective
- information theory
- Shannon entropy
- Kolmogorov complexity