Automatically Developing a Fine-grained Arabic Named Entity Corpus and Gazetteer by utilizing Wikipedia

Fahd Alotaibi; Mark Lee

Automatically Developing a Fine-grained Arabic Named Entity Corpus and Gazetteer by utilizing Wikipedia

Fahd Alotaibi, Mark Lee

Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

6 Citations (Scopus)

Abstract

This paper presents a methodology to exploit the potential of Arabic Wikipedia to assist in the automatic development of a large Fine-grained Named Entity (NE) corpus and gazetteer. The corner stone of this approach is efficient classification of Wikipedia articles to target NE classes. The resources developed were thoroughly evaluated to ensure reliability and a high quality. Results show the developed gazetteer boosts the performance of the NE classifier on a news-wire domain by at least 2 points F-measure. Moreover, by combining a learning NE classifier with the developed corpus the score achieved is a high F-measure of 85.18%. The developed resources overcome the limitations of traditional Arabic NE tasks by more fine-grained analysis and providing a beneficial route for further studies.

Original language	English
Title of host publication	6th International Joint Conference on Natural Language Processing, IJCNLP 2013 - Proceedings of the Main Conference
Editors	Ruslan Mitkov, Jong C. Park
Publisher	Asian Federation of Natural Language Processing
Pages	392-400
Number of pages	9
ISBN (Electronic)	9784990734800
Publication status	Published - 2013
Event	6th International Joint Conference on Natural Language Processing, IJCNLP 2013 - Nagoya, Japan Duration: 14 Oct 2013 → …

Publication series

Name	6th International Joint Conference on Natural Language Processing, IJCNLP 2013 - Proceedings of the Main Conference

Conference

Conference	6th International Joint Conference on Natural Language Processing, IJCNLP 2013
Country/Territory	Japan
City	Nagoya
Period	14/10/13 → …

Bibliographical note

Publisher Copyright:
© IJCNLP 2013.All right reserved.

ASJC Scopus subject areas

Artificial Intelligence
Software

Cite this

Alotaibi, F., & Lee, M. (2013). Automatically Developing a Fine-grained Arabic Named Entity Corpus and Gazetteer by utilizing Wikipedia. In R. Mitkov, & J. C. Park (Eds.), 6th International Joint Conference on Natural Language Processing, IJCNLP 2013 - Proceedings of the Main Conference (pp. 392-400). (6th International Joint Conference on Natural Language Processing, IJCNLP 2013 - Proceedings of the Main Conference). Asian Federation of Natural Language Processing.

Alotaibi, Fahd ; Lee, Mark. / Automatically Developing a Fine-grained Arabic Named Entity Corpus and Gazetteer by utilizing Wikipedia. 6th International Joint Conference on Natural Language Processing, IJCNLP 2013 - Proceedings of the Main Conference. editor / Ruslan Mitkov ; Jong C. Park. Asian Federation of Natural Language Processing, 2013. pp. 392-400 (6th International Joint Conference on Natural Language Processing, IJCNLP 2013 - Proceedings of the Main Conference).

@inproceedings{6f883308df9240b5b48e3baf55d748f1,

title = "Automatically Developing a Fine-grained Arabic Named Entity Corpus and Gazetteer by utilizing Wikipedia",

abstract = "This paper presents a methodology to exploit the potential of Arabic Wikipedia to assist in the automatic development of a large Fine-grained Named Entity (NE) corpus and gazetteer. The corner stone of this approach is efficient classification of Wikipedia articles to target NE classes. The resources developed were thoroughly evaluated to ensure reliability and a high quality. Results show the developed gazetteer boosts the performance of the NE classifier on a news-wire domain by at least 2 points F-measure. Moreover, by combining a learning NE classifier with the developed corpus the score achieved is a high F-measure of 85.18%. The developed resources overcome the limitations of traditional Arabic NE tasks by more fine-grained analysis and providing a beneficial route for further studies.",

author = "Fahd Alotaibi and Mark Lee",

year = "2013",

language = "English",

series = "6th International Joint Conference on Natural Language Processing, IJCNLP 2013 - Proceedings of the Main Conference",

publisher = "Asian Federation of Natural Language Processing",

pages = "392--400",

editor = "Ruslan Mitkov and Park, {Jong C.}",

booktitle = "6th International Joint Conference on Natural Language Processing, IJCNLP 2013 - Proceedings of the Main Conference",

}

Alotaibi, F & Lee, M 2013, Automatically Developing a Fine-grained Arabic Named Entity Corpus and Gazetteer by utilizing Wikipedia. in R Mitkov & JC Park (eds), 6th International Joint Conference on Natural Language Processing, IJCNLP 2013 - Proceedings of the Main Conference. 6th International Joint Conference on Natural Language Processing, IJCNLP 2013 - Proceedings of the Main Conference, Asian Federation of Natural Language Processing, pp. 392-400, 6th International Joint Conference on Natural Language Processing, IJCNLP 2013, Nagoya, Japan, 14/10/13.

Automatically Developing a Fine-grained Arabic Named Entity Corpus and Gazetteer by utilizing Wikipedia. / Alotaibi, Fahd; Lee, Mark.
6th International Joint Conference on Natural Language Processing, IJCNLP 2013 - Proceedings of the Main Conference. ed. / Ruslan Mitkov; Jong C. Park. Asian Federation of Natural Language Processing, 2013. p. 392-400 (6th International Joint Conference on Natural Language Processing, IJCNLP 2013 - Proceedings of the Main Conference).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

TY - GEN

T1 - Automatically Developing a Fine-grained Arabic Named Entity Corpus and Gazetteer by utilizing Wikipedia

AU - Alotaibi, Fahd

AU - Lee, Mark

PY - 2013

Y1 - 2013

N2 - This paper presents a methodology to exploit the potential of Arabic Wikipedia to assist in the automatic development of a large Fine-grained Named Entity (NE) corpus and gazetteer. The corner stone of this approach is efficient classification of Wikipedia articles to target NE classes. The resources developed were thoroughly evaluated to ensure reliability and a high quality. Results show the developed gazetteer boosts the performance of the NE classifier on a news-wire domain by at least 2 points F-measure. Moreover, by combining a learning NE classifier with the developed corpus the score achieved is a high F-measure of 85.18%. The developed resources overcome the limitations of traditional Arabic NE tasks by more fine-grained analysis and providing a beneficial route for further studies.

AB - This paper presents a methodology to exploit the potential of Arabic Wikipedia to assist in the automatic development of a large Fine-grained Named Entity (NE) corpus and gazetteer. The corner stone of this approach is efficient classification of Wikipedia articles to target NE classes. The resources developed were thoroughly evaluated to ensure reliability and a high quality. Results show the developed gazetteer boosts the performance of the NE classifier on a news-wire domain by at least 2 points F-measure. Moreover, by combining a learning NE classifier with the developed corpus the score achieved is a high F-measure of 85.18%. The developed resources overcome the limitations of traditional Arabic NE tasks by more fine-grained analysis and providing a beneficial route for further studies.

UR - http://www.scopus.com/inward/record.url?scp=84946066439&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84946066439

T3 - 6th International Joint Conference on Natural Language Processing, IJCNLP 2013 - Proceedings of the Main Conference

SP - 392

EP - 400

BT - 6th International Joint Conference on Natural Language Processing, IJCNLP 2013 - Proceedings of the Main Conference

A2 - Mitkov, Ruslan

A2 - Park, Jong C.

PB - Asian Federation of Natural Language Processing

T2 - 6th International Joint Conference on Natural Language Processing, IJCNLP 2013

Y2 - 14 October 2013

ER -

Alotaibi F, Lee M. Automatically Developing a Fine-grained Arabic Named Entity Corpus and Gazetteer by utilizing Wikipedia. In Mitkov R, Park JC, editors, 6th International Joint Conference on Natural Language Processing, IJCNLP 2013 - Proceedings of the Main Conference. Asian Federation of Natural Language Processing. 2013. p. 392-400. (6th International Joint Conference on Natural Language Processing, IJCNLP 2013 - Proceedings of the Main Conference).

Automatically Developing a Fine-grained Arabic Named Entity Corpus and Gazetteer by utilizing Wikipedia

Abstract

Publication series

Conference

Bibliographical note

ASJC Scopus subject areas

Fingerprint

Cite this