Automatically Developing a Fine-grained Arabic Named Entity Corpus and Gazetteer by utilizing Wikipedia

Fahd Alotaibi, Mark Lee

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

This paper presents a methodology to exploit the potential of Arabic Wikipedia to assist in the automatic development of a large Fine-grained Named Entity (NE) corpus and gazetteer. The corner stone of this approach is efficient classification of Wikipedia articles to target NE classes. The resources developed were thoroughly evaluated to ensure reliability and a high quality. Results show the developed gazetteer boosts the performance of the NE classifier on a news-wire domain by at least 2 points F-measure. Moreover, by combining a learning NE classifier with the developed corpus the score achieved is a high F-measure of 85.18%. The developed resources overcome the limitations of traditional Arabic NE tasks by more fine-grained analysis and providing a beneficial route for further studies.

Original languageEnglish
Title of host publication6th International Joint Conference on Natural Language Processing, IJCNLP 2013 - Proceedings of the Main Conference
EditorsRuslan Mitkov, Jong C. Park
PublisherAsian Federation of Natural Language Processing
Pages392-400
Number of pages9
ISBN (Electronic)9784990734800
Publication statusPublished - 2013
Event6th International Joint Conference on Natural Language Processing, IJCNLP 2013 - Nagoya, Japan
Duration: 14 Oct 2013 → …

Publication series

Name6th International Joint Conference on Natural Language Processing, IJCNLP 2013 - Proceedings of the Main Conference

Conference

Conference6th International Joint Conference on Natural Language Processing, IJCNLP 2013
Country/TerritoryJapan
CityNagoya
Period14/10/13 → …

Bibliographical note

Publisher Copyright:
© IJCNLP 2013.All right reserved.

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software

Fingerprint

Dive into the research topics of 'Automatically Developing a Fine-grained Arabic Named Entity Corpus and Gazetteer by utilizing Wikipedia'. Together they form a unique fingerprint.

Cite this