We investigate internal and stylistic factors affecting binary and ternary relativizer choice in subject (that vs which) and non-subject (that vs which vs zero) relative clauses. We employ a novel methodological approach to predicting relativizers: Bayesian regression modeling with the dimensional reduction of model inputs via factor analysis. Our factor analysis is motivated by the high degree of redundancy and collinearity in natural language data, while Bayesian regression models are robust to effects of data sparseness and (near) separation. We find that in both types of relative clauses, the more marked variant (which) is preferred in complex contexts, while the unmarked variant (that, or zero in NSRCs) is favored in contexts where the relative clause is short and more fully integrated with the NP it modifies. We also find that use of which is somewhat more sensitive to stylistic considerations in subject than in non-subject relative clauses, and that which correlates most strongly with features associated with lexical density, e. g. ‘nouniness’, rather than those often associated with formality, e. g. passivization and sentence length.
|Journal||Corpus Linguistics and Linguistic Theory|
|Early online date||25 Nov 2016|
|Publication status||E-pub ahead of print - 25 Nov 2016|
- English relativizers
- stylistic variation
- Bayesian modelling