Abstract
Modern online services often require mobile devices to convert paper-based information into its digital counterpart, e.g., passport, ownership documents, etc. This process relies on Document Localization (DL) technology to detect the outline of a document within a photograph. In recent years, increased demand for real-time DL in live video has emerged, especially in financial services. However, existing machine-learning approaches to DL cannot be easily applied due to the large size of the underlying models and the associated long inference time. In this paper, we propose a lightweight DL model, LDRNet, to localize documents in real-time video captured on mobile devices. On the basis of a lightweight backbone neural network, we design three prediction branches for LDRNet: (1) corner points prediction; (2) line borders prediction and (3) document classification. To improve the accuracy, we design novel supplementary targets, the equal-division points, and use a new loss function named Line Loss. We compare the performance of LDRNet with other popular approaches on localization for general documents in a number of datasets. The experimental results show that LDRNet takes significantly less inference time, while still achieving comparable accuracy.
| Original language | English |
|---|---|
| Title of host publication | Machine Learning and Principles and Practice of Knowledge Discovery in Databases |
| Subtitle of host publication | International Workshops of ECML PKDD 2022, Grenoble, France, September 19–23, 2022, Proceedings, Part I |
| Editors | Irena Koprinska, Paolo Mignone, Riccardo Guidotti, Szymon Jaroszewicz, Holger Fröning, Francesco Gullo, Pedro M. Ferreira, Damian Roqueiro, Gaia Ceddia, Slawomir Nowaczyk, João Gama, Rita Ribeiro, Ricard Gavaldà, Elio Masciari, Zbigniew Ras, Ettore Ritacco, Francesca Naretto, Andreas Theissler, Przemyslaw Biecek, Wouter Verbeke, Gregor Schiele, Franz Pernkopf, Michaela Blott, Ilaria Bordino, Ivan Luciano Danesi, Giovanni Ponti, Lorenzo Severini, Annalisa Appice, Giuseppina Andresini, Ibéria Medeiros, Guilherme Graça, Lee Cooper, Naghmeh Ghazaleh, Jonas Richiardi, Diego Saldana, Konstantinos Sechidis, Arif Canakoglu, Sara Pido, Pietro Pinoli, Albert Bifet, Sepideh Pashami |
| Publisher | Springer |
| Pages | 618-629 |
| Number of pages | 12 |
| Edition | 1 |
| ISBN (Electronic) | 9783031236181 |
| ISBN (Print) | 9783031236174 |
| DOIs | |
| Publication status | Published - 31 Jan 2023 |
| Event | Workshops on SoGood, NFMCP, XKDD, UMOD, ITEM, MIDAS, MLCS, MLBEM, PharML, DALS, IoT-PdM 2022, held in conjunction with the 21st Joint European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2022 - Grenoble, France Duration: 19 Sept 2022 → 23 Sept 2022 |
Publication series
| Name | Communications in Computer and Information Science |
|---|---|
| Publisher | Springer |
| Volume | 1752 |
| ISSN (Print) | 1865-0929 |
| ISSN (Electronic) | 1865-0937 |
Conference
| Conference | Workshops on SoGood, NFMCP, XKDD, UMOD, ITEM, MIDAS, MLCS, MLBEM, PharML, DALS, IoT-PdM 2022, held in conjunction with the 21st Joint European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2022 |
|---|---|
| Country/Territory | France |
| City | Grenoble |
| Period | 19/09/22 → 23/09/22 |
Bibliographical note
Publisher Copyright:© 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
Keywords
- Document localization
- Mobile devices
- Real time
ASJC Scopus subject areas
- General Computer Science
- General Mathematics