Abstract
The relation between vocabulary size {V(N)) and the text size (AO has been reexamined, where the text has been presented as binary sequence. Six different texts by three authors from different periods were taken from the Corpus of Serbian Language to be analyzed. Statistics included regression analysis, randomness test for binary sequence and stochastic models. Point of equivalence, where number of new and old words is equal, has been proposed as characteristic constant of the text. This constant is independent on N and could be used as an index of vocabulary richness.
Original language | English |
---|---|
Pages | 47-52 |
Number of pages | 6 |
Publication status | Published - 2003 |
Event | 4th International Workshop on Linguistically Interpreted Corpora at the 10th European Chapter of the Association for Computational Linguistics, LINC@EACL 2003 - Budapest, Hungary Duration: 13 Apr 2003 → 14 Apr 2003 |
Conference
Conference | 4th International Workshop on Linguistically Interpreted Corpora at the 10th European Chapter of the Association for Computational Linguistics, LINC@EACL 2003 |
---|---|
Country/Territory | Hungary |
City | Budapest |
Period | 13/04/03 → 14/04/03 |
Bibliographical note
Publisher Copyright:© LINCEACL 2003.All right reserved.
Keywords
- Binary sequence
- Characteristic constant of the text
- Text size
- Vocabulary size
ASJC Scopus subject areas
- Language and Linguistics
- Linguistics and Language