Abstract
We present in this paper a system for measuring Semantic Text Similarity (STS) in English. We introduce three novel techniques: the use of Types, methods of linking phrases, and the use of a Surprise Factor to generate 8,370 similarity measures, which we then combine using Support Vector and Kernel Ridge Regression. Our system out performs the State of the Art in SemEval 2015, and our best performing run achieved a score of .7094 on the 2016 test set as a whole, and over 0.8 on the majority of the datasets. Additionally, the use of Surprise, Types and phrase linking is not limited to STS and can be used across various Natural Language Processing tasks, while our method of combining scores provides a flexible way of combining variously generated Similarity Scores.
Original language | English |
---|---|
Title of host publication | 10th International Workshop on Semantic Evaluation (SemEval-2016) |
Subtitle of host publication | Proceedings of the Workshop |
Publisher | Association for Computational Linguistics, ACL |
Pages | 680-685 |
Number of pages | 5 |
ISBN (Electronic) | 978-1941643952 |
Publication status | Published - 17 Jul 2016 |
Event | 10th International Workshop on Semantic Evaluation (SemEval 2016) - San Diego Duration: 16 Jun 2016 → 17 Jun 2016 |
Conference
Conference | 10th International Workshop on Semantic Evaluation (SemEval 2016) |
---|---|
City | San Diego |
Period | 16/06/16 → 17/06/16 |
Keywords
- Semantic analysis
- natural language processing