Abstract
This research addresses two major challenges in studying second language acquisition and bilingualism: reducing overlap in predictor variables and correctly classifying participants into language proficiency levels. Too many relevant predictors can harm statistical analysis due to an increased chance of overlap, known as multicollinearity. To tackle this, we use Principal Component Analysis (PCA) on selected predictors to identify proficiency indicators, combining the length of stay in the UK and language test scores. Additionally, traditional methods, especially IELTS-based proficiency classifications, often miss subtle differences in language skills, particularly when they fail to consider how long participants have been exposed to the target language. We counter this by using non-hierarchical Cluster Analysis (NCA) for a grounded, data- driven way of detecting distinct language proficiency groups. This new approach is demonstrated on a dataset of eye movements from reading tasks, collected from Chinese-English bilinguals in the UK.
Original language | English |
---|---|
Journal | Linguistic Approaches to Bilingualism |
Early online date | 28 Apr 2025 |
DOIs | |
Publication status | E-pub ahead of print - 28 Apr 2025 |
Keywords
- principal component analysis
- non-hierarchical cluster analysis
- generalised additive mixed models
- eye movements in reading
- L2 proficiency classification