Abstract
This paper presents a methodology to exploit the potential of Arabic Wikipedia to assist in the automatic development of a large Fine-grained Named Entity (NE) corpus and gazetteer. The corner stone of this approach is efficient classification of Wikipedia articles to target NE classes. The resources developed were thoroughly evaluated to ensure reliability and a high quality. Results show the developed gazetteer boosts the performance of the NE classifier on a news-wire domain by at least 2 points F-measure. Moreover, by combining a learning NE classifier with the developed corpus the score achieved is a high F-measure of 85.18%. The developed resources overcome the limitations of traditional Arabic NE tasks by more fine-grained analysis and providing a beneficial route for further studies.
Original language | English |
---|---|
Title of host publication | 6th International Joint Conference on Natural Language Processing, IJCNLP 2013 - Proceedings of the Main Conference |
Editors | Ruslan Mitkov, Jong C. Park |
Publisher | Asian Federation of Natural Language Processing |
Pages | 392-400 |
Number of pages | 9 |
ISBN (Electronic) | 9784990734800 |
Publication status | Published - 2013 |
Event | 6th International Joint Conference on Natural Language Processing, IJCNLP 2013 - Nagoya, Japan Duration: 14 Oct 2013 → … |
Publication series
Name | 6th International Joint Conference on Natural Language Processing, IJCNLP 2013 - Proceedings of the Main Conference |
---|
Conference
Conference | 6th International Joint Conference on Natural Language Processing, IJCNLP 2013 |
---|---|
Country/Territory | Japan |
City | Nagoya |
Period | 14/10/13 → … |
Bibliographical note
Publisher Copyright:© IJCNLP 2013.All right reserved.
ASJC Scopus subject areas
- Artificial Intelligence
- Software