Abstract
A rigorous psychometric approach is crucial for the accurate measurement of mind-reading abilities. Traditional scoring methods for such tests, which involve lengthy free-text responses, require considerable time and human effort. This study investigates the use of large language models (LLMs) to automate the scoring of psychometric tests. Data were collected from participants aged 13 to 30 years and scored by trained human coders to establish a benchmark. We evaluated multiple LLMs against human assessments, exploring various prompting strategies to optimize performance and fine-tuning the models using a subset of the collected data to enhance accuracy. Our results demonstrate that LLMs can assess advanced mind-reading abilities with over 90% accuracy on average. Notably, in most test items, the LLMs achieved higher Kappa agreement with the lead coder than two trained human coders, highlighting their potential to reliably score open-response psychometric tests.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 10th Workshop on Computational Linguistics and Clinical Psychology (CLPsych 2025)) |
| Publisher | Association for Computational Linguistics, ACL |
| Pages | 79–89 |
| Number of pages | 10 |
| ISBN (Electronic) | 9798891762268 |
| DOIs | |
| Publication status | Published - May 2025 |
| Event | The Workshop on Computational Linguistics and Clinical Psychology - Albuquerque, United States Duration: 3 May 2025 → 3 May 2025 Conference number: 10 https://clpsych.org/call-for-papers/ |
Conference
| Conference | The Workshop on Computational Linguistics and Clinical Psychology |
|---|---|
| Abbreviated title | CLPsych 2025 |
| Country/Territory | United States |
| City | Albuquerque |
| Period | 3/05/25 → 3/05/25 |
| Internet address |
Keywords
- mind-reading
- large language models