LDRNet: Enabling Real-Time Document Localization on Mobile Devices

  • Han Wu
  • , Holland Qian
  • , Huaming Wu*
  • , Aad van Moorsel
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Modern online services often require mobile devices to convert paper-based information into its digital counterpart, e.g., passport, ownership documents, etc. This process relies on Document Localization (DL) technology to detect the outline of a document within a photograph. In recent years, increased demand for real-time DL in live video has emerged, especially in financial services. However, existing machine-learning approaches to DL cannot be easily applied due to the large size of the underlying models and the associated long inference time. In this paper, we propose a lightweight DL model, LDRNet, to localize documents in real-time video captured on mobile devices. On the basis of a lightweight backbone neural network, we design three prediction branches for LDRNet: (1) corner points prediction; (2) line borders prediction and (3) document classification. To improve the accuracy, we design novel supplementary targets, the equal-division points, and use a new loss function named Line Loss. We compare the performance of LDRNet with other popular approaches on localization for general documents in a number of datasets. The experimental results show that LDRNet takes significantly less inference time, while still achieving comparable accuracy.

Original languageEnglish
Title of host publicationMachine Learning and Principles and Practice of Knowledge Discovery in Databases
Subtitle of host publicationInternational Workshops of ECML PKDD 2022, Grenoble, France, September 19–23, 2022, Proceedings, Part I
EditorsIrena Koprinska, Paolo Mignone, Riccardo Guidotti, Szymon Jaroszewicz, Holger Fröning, Francesco Gullo, Pedro M. Ferreira, Damian Roqueiro, Gaia Ceddia, Slawomir Nowaczyk, João Gama, Rita Ribeiro, Ricard Gavaldà, Elio Masciari, Zbigniew Ras, Ettore Ritacco, Francesca Naretto, Andreas Theissler, Przemyslaw Biecek, Wouter Verbeke, Gregor Schiele, Franz Pernkopf, Michaela Blott, Ilaria Bordino, Ivan Luciano Danesi, Giovanni Ponti, Lorenzo Severini, Annalisa Appice, Giuseppina Andresini, Ibéria Medeiros, Guilherme Graça, Lee Cooper, Naghmeh Ghazaleh, Jonas Richiardi, Diego Saldana, Konstantinos Sechidis, Arif Canakoglu, Sara Pido, Pietro Pinoli, Albert Bifet, Sepideh Pashami
PublisherSpringer
Pages618-629
Number of pages12
Edition1
ISBN (Electronic)9783031236181
ISBN (Print)9783031236174
DOIs
Publication statusPublished - 31 Jan 2023
EventWorkshops on SoGood, NFMCP, XKDD, UMOD, ITEM, MIDAS, MLCS, MLBEM, PharML, DALS, IoT-PdM 2022, held in conjunction with the 21st Joint European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2022 - Grenoble, France
Duration: 19 Sept 202223 Sept 2022

Publication series

NameCommunications in Computer and Information Science
PublisherSpringer
Volume1752
ISSN (Print)1865-0929
ISSN (Electronic)1865-0937

Conference

ConferenceWorkshops on SoGood, NFMCP, XKDD, UMOD, ITEM, MIDAS, MLCS, MLBEM, PharML, DALS, IoT-PdM 2022, held in conjunction with the 21st Joint European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2022
Country/TerritoryFrance
CityGrenoble
Period19/09/2223/09/22

Bibliographical note

Publisher Copyright:
© 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.

Keywords

  • Document localization
  • Mobile devices
  • Real time

ASJC Scopus subject areas

  • General Computer Science
  • General Mathematics

Fingerprint

Dive into the research topics of 'LDRNet: Enabling Real-Time Document Localization on Mobile Devices'. Together they form a unique fingerprint.

Cite this