A Linear Grammar Approach to Mathematical Formula Recognition from PDF

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Colleges, School and Institutes

Abstract

Many approaches have been proposed over the years for the recognition of mathematical formulae from scanned documents. More recently a need has arisen to recognise formulae from PDF documents. Here we can avoid ambiguities introduced by traditional OCR approaches and instead extract perfect knowledge of the characters used in formulae directly from the document. This can be exploited by formula recognition techniques to achieve correct results and high performance.

In this paper we revisit an old grammatical approach to formula recognition, that of Anderson from 1968, and assess its applicability with respect to data extracted from PDF documents. We identify some problems of the original method when applied to common mathematical expressions and show how they can be overcome. The simplicity of the original method leads to a very efficient recognition technique that not only is very simple to implement but also yields results of high accuracy for the recognition of mathematical formulae from PDF documents.

Details

Original languageEnglish
Title of host publicationIntelligent Computer Mathematics
Subtitle of host publication16th Symposium, Calculemus 2009, 8th International Conference, MKM 2009, Held as Part of CICM 2009, Grand Bend, Canada, July 6-12, 2009. Proceedings
EditorsJ Carette, L Dixon, C Sacerdoti Coen, SM Watt
Publication statusPublished - 6 Jul 2009
Event16th Symposium, Calculemus 2009, 8th International Conference, MKM 2009, Held as Part of CICM 2009 - Great Bend, Canada
Duration: 6 Jul 200912 Jul 2009

Publication series

NameLecture Notes in Computer Science
PublisherSpringer
Volume5625
ISSN (Print)0302-9743

Conference

Conference16th Symposium, Calculemus 2009, 8th International Conference, MKM 2009, Held as Part of CICM 2009
CountryCanada
CityGreat Bend
Period6/07/0912/07/09