An automated system for grammatical analysis of Twitter messages. A learning task application

Mourad Oussalah; B. Escallier; D. Daher

doi:10.1016/j.knosys.2016.02.015

An automated system for grammatical analysis of Twitter messages. A learning task application

Mourad Oussalah, B. Escallier, D. Daher

Electronic, Electrical and Systems Engineering

Research output: Contribution to journal › Article › peer-review

5 Citations (Scopus)

341 Downloads (Pure)

Abstract

This paper describes an educational study involving the use of Twitter as a way to enhance High School students’ interaction while improving the linguistic quality of their messages. For this purpose, an interactive system has been developed for Twitter collection and analysis from grammatical perspective. The automated system involves a comprehensive data normalization phase, which allows us to identify any unknown token, and a grammatical analysis system. The latter makes use of a logical reasoning on bi-gram token representation as well as a simple rule-based reasoning in case of named-entity detection. The developed system allows the user to perform spatial, topic-based or identity-based search functionalities. Besides, the system generates interrupt to moderator (s) together with some statistical parameters related to user activity as soon as a linguistic inconsistency has been detected in order to take relevant course of actions. The automated system allows us to identify both the text normalization issues and the grammatical inconsistencies. The latter makes use of logical reasoning using bi-gram Wikipedia matching. A statistical analysis of tweet messages gathered from students that took part to this study has been carried out. Besides, the contribution of the peers to the improvement of the linguistic quality of users’ messages has been quantified and investigated. The study demonstrates the interest of the participants to this new learning experience and evaluates the influence of the peers on their writing skills. Especially, the visibility and noticeability of Twitter messages to a large audience have been found to contribute widely to raise students’ awareness about the linguistic quality of their messages. The study has also revealed the predominance of the slang language in their daily Twitter writings. Such abbreviations have shown to pose the greatest challenge for any automatic text analysis. Similarly named-entity identification and handling have also been shown to be very challenging, especially, given the nature of Twitter messages where capitalizing is often employed for emphasize as well.

Original language	English
Pages (from-to)	31-47
Journal	Knowledge-Based Systems
Volume	101
Early online date	11 Mar 2016
DOIs	https://doi.org/10.1016/j.knosys.2016.02.015
Publication status	Published - 1 Jun 2016

Keywords

Data mining
Twitter
Social network
Learning

Access to Document

10.1016/j.knosys.2016.02.015Licence: None: All rights reserved

Oussalah_et_al_An_automated_system_Knowledge-Based_Systems_2016
Eligibility for repository checked: 21/04/16
Accepted author manuscript, 993 KBLicence: Creative Commons: Attribution-NonCommercial-NoDerivs (CC BY-NC-ND)

Cite this

@article{b7b71c1863924d8cbc883f2d2c7a6100,

title = "An automated system for grammatical analysis of Twitter messages. A learning task application",

abstract = "This paper describes an educational study involving the use of Twitter as a way to enhance High School students{\textquoteright} interaction while improving the linguistic quality of their messages. For this purpose, an interactive system has been developed for Twitter collection and analysis from grammatical perspective. The automated system involves a comprehensive data normalization phase, which allows us to identify any unknown token, and a grammatical analysis system. The latter makes use of a logical reasoning on bi-gram token representation as well as a simple rule-based reasoning in case of named-entity detection. The developed system allows the user to perform spatial, topic-based or identity-based search functionalities. Besides, the system generates interrupt to moderator (s) together with some statistical parameters related to user activity as soon as a linguistic inconsistency has been detected in order to take relevant course of actions. The automated system allows us to identify both the text normalization issues and the grammatical inconsistencies. The latter makes use of logical reasoning using bi-gram Wikipedia matching. A statistical analysis of tweet messages gathered from students that took part to this study has been carried out. Besides, the contribution of the peers to the improvement of the linguistic quality of users{\textquoteright} messages has been quantified and investigated. The study demonstrates the interest of the participants to this new learning experience and evaluates the influence of the peers on their writing skills. Especially, the visibility and noticeability of Twitter messages to a large audience have been found to contribute widely to raise students{\textquoteright} awareness about the linguistic quality of their messages. The study has also revealed the predominance of the slang language in their daily Twitter writings. Such abbreviations have shown to pose the greatest challenge for any automatic text analysis. Similarly named-entity identification and handling have also been shown to be very challenging, especially, given the nature of Twitter messages where capitalizing is often employed for emphasize as well.",

keywords = "Data mining, Twitter, Social network, Learning",

author = "Mourad Oussalah and B. Escallier and D. Daher",

year = "2016",

month = jun,

day = "1",

doi = "10.1016/j.knosys.2016.02.015",

language = "English",

volume = "101",

pages = "31--47",

journal = "Knowledge-Based Systems",

issn = "0950-7051",

publisher = "Elsevier",

}

TY - JOUR

T1 - An automated system for grammatical analysis of Twitter messages. A learning task application

AU - Oussalah, Mourad

AU - Escallier, B.

AU - Daher, D.

PY - 2016/6/1

Y1 - 2016/6/1

N2 - This paper describes an educational study involving the use of Twitter as a way to enhance High School students’ interaction while improving the linguistic quality of their messages. For this purpose, an interactive system has been developed for Twitter collection and analysis from grammatical perspective. The automated system involves a comprehensive data normalization phase, which allows us to identify any unknown token, and a grammatical analysis system. The latter makes use of a logical reasoning on bi-gram token representation as well as a simple rule-based reasoning in case of named-entity detection. The developed system allows the user to perform spatial, topic-based or identity-based search functionalities. Besides, the system generates interrupt to moderator (s) together with some statistical parameters related to user activity as soon as a linguistic inconsistency has been detected in order to take relevant course of actions. The automated system allows us to identify both the text normalization issues and the grammatical inconsistencies. The latter makes use of logical reasoning using bi-gram Wikipedia matching. A statistical analysis of tweet messages gathered from students that took part to this study has been carried out. Besides, the contribution of the peers to the improvement of the linguistic quality of users’ messages has been quantified and investigated. The study demonstrates the interest of the participants to this new learning experience and evaluates the influence of the peers on their writing skills. Especially, the visibility and noticeability of Twitter messages to a large audience have been found to contribute widely to raise students’ awareness about the linguistic quality of their messages. The study has also revealed the predominance of the slang language in their daily Twitter writings. Such abbreviations have shown to pose the greatest challenge for any automatic text analysis. Similarly named-entity identification and handling have also been shown to be very challenging, especially, given the nature of Twitter messages where capitalizing is often employed for emphasize as well.

AB - This paper describes an educational study involving the use of Twitter as a way to enhance High School students’ interaction while improving the linguistic quality of their messages. For this purpose, an interactive system has been developed for Twitter collection and analysis from grammatical perspective. The automated system involves a comprehensive data normalization phase, which allows us to identify any unknown token, and a grammatical analysis system. The latter makes use of a logical reasoning on bi-gram token representation as well as a simple rule-based reasoning in case of named-entity detection. The developed system allows the user to perform spatial, topic-based or identity-based search functionalities. Besides, the system generates interrupt to moderator (s) together with some statistical parameters related to user activity as soon as a linguistic inconsistency has been detected in order to take relevant course of actions. The automated system allows us to identify both the text normalization issues and the grammatical inconsistencies. The latter makes use of logical reasoning using bi-gram Wikipedia matching. A statistical analysis of tweet messages gathered from students that took part to this study has been carried out. Besides, the contribution of the peers to the improvement of the linguistic quality of users’ messages has been quantified and investigated. The study demonstrates the interest of the participants to this new learning experience and evaluates the influence of the peers on their writing skills. Especially, the visibility and noticeability of Twitter messages to a large audience have been found to contribute widely to raise students’ awareness about the linguistic quality of their messages. The study has also revealed the predominance of the slang language in their daily Twitter writings. Such abbreviations have shown to pose the greatest challenge for any automatic text analysis. Similarly named-entity identification and handling have also been shown to be very challenging, especially, given the nature of Twitter messages where capitalizing is often employed for emphasize as well.

KW - Data mining

KW - Twitter

KW - Social network

KW - Learning

U2 - 10.1016/j.knosys.2016.02.015

DO - 10.1016/j.knosys.2016.02.015

M3 - Article

SN - 0950-7051

VL - 101

SP - 31

EP - 47

JO - Knowledge-Based Systems

JF - Knowledge-Based Systems

ER -

An automated system for grammatical analysis of Twitter messages. A learning task application

Abstract

Keywords

Access to Document

Fingerprint

Cite this