Repurposing existing deep networks for caption and aesthetic-guided image cropping

Nora Horanyi; Kedi Xia; Kwang Moo Yi; Abhishake Kumar Bojja; Ales Leonardis; Hyung Jin Chang

doi:10.1016/j.patcog.2021.108485

Repurposing existing deep networks for caption and aesthetic-guided image cropping

Nora Horanyi, Kedi Xia, Kwang Moo Yi, Abhishake Kumar Bojja, Ales Leonardis, Hyung Jin Chang

Computer Science

Research output: Contribution to journal › Article › peer-review

7 Downloads (Pure)

Abstract

We propose a novel optimization framework that crops a given image based on user description and aesthetics. Unlike existing image cropping methods, where one typically trains a deep network to regress to crop parameters or cropping actions, we propose to directly optimize for the cropping parameters by repurposing pre-trained networks on image captioning and aesthetic tasks, without any fine-tuning, thereby avoiding training a separate network. Specifically, we search for the best crop parameters that minimize a combined loss of the initial objectives of these networks. To make the optimization stable, we propose three strategies: (i) multi-scale bilinear sampling, (ii) annealing the scale of the crop region, therefore effectively reducing the parameter space, (iii) aggregation of multiple optimization results. Through various quantitative and qualitative evaluations, we show that our framework can produce crops that are well-aligned to intended user descriptions and aesthetically pleasing.

Original language	English
Article number	108485
Pages (from-to)	108485
Journal	Pattern Recognition
Volume	126
Early online date	14 Feb 2022
DOIs	https://doi.org/10.1016/j.patcog.2021.108485
Publication status	Published - 7 Jun 2022

Keywords

Aesthetics
Deep network re-purposing
Image captioning
Image cropping
cs.CL
cs.CV

Access to Document

10.1016/j.patcog.2021.108485Licence: None: All rights reserved

HoranyiN2022RepurAccepted author manuscript, 6.37 MBLicence: Creative Commons: Attribution-NonCommercial-NoDerivs (CC BY-NC-ND)

Understanding scenes and events through joint parsing, cognitive reasoning and lifelong learning (Oxford lead)
Leonardis, A.
Engineering & Physical Science Research Council
1/01/16 → 28/02/22
Project: Research Councils

Cite this

@article{4393655b0ff549bcb3b76b6eba82f351,

title = "Repurposing existing deep networks for caption and aesthetic-guided image cropping",

abstract = "We propose a novel optimization framework that crops a given image based on user description and aesthetics. Unlike existing image cropping methods, where one typically trains a deep network to regress to crop parameters or cropping actions, we propose to directly optimize for the cropping parameters by repurposing pre-trained networks on image captioning and aesthetic tasks, without any fine-tuning, thereby avoiding training a separate network. Specifically, we search for the best crop parameters that minimize a combined loss of the initial objectives of these networks. To make the optimization stable, we propose three strategies: (i) multi-scale bilinear sampling, (ii) annealing the scale of the crop region, therefore effectively reducing the parameter space, (iii) aggregation of multiple optimization results. Through various quantitative and qualitative evaluations, we show that our framework can produce crops that are well-aligned to intended user descriptions and aesthetically pleasing.",

keywords = "Aesthetics, Deep network re-purposing, Image captioning, Image cropping, cs.CL, cs.CV",

author = "Nora Horanyi and Kedi Xia and Yi, {Kwang Moo} and Bojja, {Abhishake Kumar} and Ales Leonardis and Chang, {Hyung Jin}",

year = "2022",

month = jun,

day = "7",

doi = "10.1016/j.patcog.2021.108485",

language = "English",

volume = "126",

pages = "108485",

journal = "Pattern Recognition",

issn = "0031-3203",

publisher = "Elsevier",

}

TY - JOUR

T1 - Repurposing existing deep networks for caption and aesthetic-guided image cropping

AU - Horanyi, Nora

AU - Xia, Kedi

AU - Yi, Kwang Moo

AU - Bojja, Abhishake Kumar

AU - Leonardis, Ales

AU - Chang, Hyung Jin

PY - 2022/6/7

Y1 - 2022/6/7

N2 - We propose a novel optimization framework that crops a given image based on user description and aesthetics. Unlike existing image cropping methods, where one typically trains a deep network to regress to crop parameters or cropping actions, we propose to directly optimize for the cropping parameters by repurposing pre-trained networks on image captioning and aesthetic tasks, without any fine-tuning, thereby avoiding training a separate network. Specifically, we search for the best crop parameters that minimize a combined loss of the initial objectives of these networks. To make the optimization stable, we propose three strategies: (i) multi-scale bilinear sampling, (ii) annealing the scale of the crop region, therefore effectively reducing the parameter space, (iii) aggregation of multiple optimization results. Through various quantitative and qualitative evaluations, we show that our framework can produce crops that are well-aligned to intended user descriptions and aesthetically pleasing.

AB - We propose a novel optimization framework that crops a given image based on user description and aesthetics. Unlike existing image cropping methods, where one typically trains a deep network to regress to crop parameters or cropping actions, we propose to directly optimize for the cropping parameters by repurposing pre-trained networks on image captioning and aesthetic tasks, without any fine-tuning, thereby avoiding training a separate network. Specifically, we search for the best crop parameters that minimize a combined loss of the initial objectives of these networks. To make the optimization stable, we propose three strategies: (i) multi-scale bilinear sampling, (ii) annealing the scale of the crop region, therefore effectively reducing the parameter space, (iii) aggregation of multiple optimization results. Through various quantitative and qualitative evaluations, we show that our framework can produce crops that are well-aligned to intended user descriptions and aesthetically pleasing.

KW - Aesthetics

KW - Deep network re-purposing

KW - Image captioning

KW - Image cropping

KW - cs.CL

KW - cs.CV

UR - http://www.scopus.com/inward/record.url?scp=85124470989&partnerID=8YFLogxK

U2 - 10.1016/j.patcog.2021.108485

DO - 10.1016/j.patcog.2021.108485

M3 - Article

SN - 0031-3203

VL - 126

SP - 108485

JO - Pattern Recognition

JF - Pattern Recognition

M1 - 108485

ER -

Repurposing existing deep networks for caption and aesthetic-guided image cropping

Abstract

Keywords

Access to Document

Fingerprint

Projects

Understanding scenes and events through joint parsing, cognitive reasoning and lifelong learning (Oxford lead)

Cite this