UoB-UK at SemEval-2016 Task 1: A Flexible and Extendable System for Semantic Text Similarity using Types, Surprise and Phrase Linking

Harish Tayyar Madabushi; Mark Buhagiar; Mark Lee

UoB-UK at SemEval-2016 Task 1: A Flexible and Extendable System for Semantic Text Similarity using Types, Surprise and Phrase Linking

Harish Tayyar Madabushi, Mark Buhagiar, Mark Lee

Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Abstract

We present in this paper a system for measuring Semantic Text Similarity (STS) in English. We introduce three novel techniques: the use of Types, methods of linking phrases, and the use of a Surprise Factor to generate 8,370 similarity measures, which we then combine using Support Vector and Kernel Ridge Regression. Our system out performs the State of the Art in SemEval 2015, and our best performing run achieved a score of .7094 on the 2016 test set as a whole, and over 0.8 on the majority of the datasets. Additionally, the use of Surprise, Types and phrase linking is not limited to STS and can be used across various Natural Language Processing tasks, while our method of combining scores provides a flexible way of combining variously generated Similarity Scores.

Original language	English
Title of host publication	10th International Workshop on Semantic Evaluation (SemEval-2016)
Publication status	Published - 17 Jul 2016

Cite this

@inproceedings{a0d4948d033647bcb38b0e28370aa530,

title = "UoB-UK at SemEval-2016 Task 1: A Flexible and Extendable System for Semantic Text Similarity using Types, Surprise and Phrase Linking",

abstract = "We present in this paper a system for measuring Semantic Text Similarity (STS) in English. We introduce three novel techniques: the use of Types, methods of linking phrases, and the use of a Surprise Factor to generate 8,370 similarity measures, which we then combine using Support Vector and Kernel Ridge Regression. Our system out performs the State of the Art in SemEval 2015, and our best performing run achieved a score of .7094 on the 2016 test set as a whole, and over 0.8 on the majority of the datasets. Additionally, the use of Surprise, Types and phrase linking is not limited to STS and can be used across various Natural Language Processing tasks, while our method of combining scores provides a flexible way of combining variously generated Similarity Scores.",

author = "Madabushi, {Harish Tayyar} and Mark Buhagiar and Mark Lee",

year = "2016",

month = jul,

day = "17",

language = "English",

isbn = "978-1941643952",

booktitle = "10th International Workshop on Semantic Evaluation (SemEval-2016)",

}

TY - GEN

T1 - UoB-UK at SemEval-2016 Task 1

T2 - A Flexible and Extendable System for Semantic Text Similarity using Types, Surprise and Phrase Linking

AU - Madabushi, Harish Tayyar

AU - Buhagiar, Mark

AU - Lee, Mark

PY - 2016/7/17

Y1 - 2016/7/17

N2 - We present in this paper a system for measuring Semantic Text Similarity (STS) in English. We introduce three novel techniques: the use of Types, methods of linking phrases, and the use of a Surprise Factor to generate 8,370 similarity measures, which we then combine using Support Vector and Kernel Ridge Regression. Our system out performs the State of the Art in SemEval 2015, and our best performing run achieved a score of .7094 on the 2016 test set as a whole, and over 0.8 on the majority of the datasets. Additionally, the use of Surprise, Types and phrase linking is not limited to STS and can be used across various Natural Language Processing tasks, while our method of combining scores provides a flexible way of combining variously generated Similarity Scores.

AB - We present in this paper a system for measuring Semantic Text Similarity (STS) in English. We introduce three novel techniques: the use of Types, methods of linking phrases, and the use of a Surprise Factor to generate 8,370 similarity measures, which we then combine using Support Vector and Kernel Ridge Regression. Our system out performs the State of the Art in SemEval 2015, and our best performing run achieved a score of .7094 on the 2016 test set as a whole, and over 0.8 on the majority of the datasets. Additionally, the use of Surprise, Types and phrase linking is not limited to STS and can be used across various Natural Language Processing tasks, while our method of combining scores provides a flexible way of combining variously generated Similarity Scores.

M3 - Conference contribution

SN - 978-1941643952

BT - 10th International Workshop on Semantic Evaluation (SemEval-2016)

ER -

UoB-UK at SemEval-2016 Task 1: A Flexible and Extendable System for Semantic Text Similarity using Types, Surprise and Phrase Linking

Abstract

Fingerprint

Cite this