Repurposing existing deep networks for caption and aesthetic-guided image cropping

Nora Horanyi, Kedi Xia, Kwang Moo Yi, Abhishake Kumar Bojja, Ales Leonardis, Hyung Jin Chang

Research output: Contribution to journalArticlepeer-review

Abstract

We propose a novel optimization framework that crops a given image based on user description and aesthetics. Unlike existing image cropping methods, where one typically trains a deep network to regress to crop parameters or cropping actions, we propose to directly optimize for the cropping parameters by repurposing pre-trained networks on image captioning and aesthetic tasks, without any fine-tuning, thereby avoiding training a separate network. Specifically, we search for the best crop parameters that minimize a combined loss of the initial objectives of these networks. To make the optimization stable, we propose three strategies: (i) multi-scale bilinear sampling, (ii) annealing the scale of the crop region, therefore effectively reducing the parameter space, (iii) aggregation of multiple optimization results. Through various quantitative and qualitative evaluations, we show that our framework can produce crops that are well-aligned to intended user descriptions and aesthetically pleasing.
Original languageEnglish
Article number108485
Pages (from-to)108485
JournalPattern Recognition
Volume126
Early online date14 Feb 2022
DOIs
Publication statusE-pub ahead of print - 14 Feb 2022

Keywords

  • cs.CV
  • cs.CL
  • Image cropping
  • Aesthetics
  • Deep network re-purposing
  • Image captioning

Fingerprint

Dive into the research topics of 'Repurposing existing deep networks for caption and aesthetic-guided image cropping'. Together they form a unique fingerprint.

Cite this