Unified Image and Video Saliency Modeling

Richard Droste*, Jianbo Jiao, J. Alison Noble

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

21 Citations (Scopus)

Abstract

Visual saliency modeling for images and videos is treated as two independent tasks in recent computer vision literature. While image saliency modeling is a well-studied problem and progress on benchmarks like SALICON and MIT300 is slowing, video saliency models have shown rapid gains on the recent DHF1K benchmark. Here, we take a step back and ask: Can image and video saliency modeling be approached via a unified model, with mutual benefit? We identify different sources of domain shift between image and video saliency data and between different video saliency datasets as a key challenge for effective joint modelling. To address this we propose four novel domain adaptation techniques—Domain-Adaptive Priors, Domain-Adaptive Fusion, Domain-Adaptive Smoothing and Bypass-RNN—in addition to an improved formulation of learned Gaussian priors. We integrate these techniques into a simple and lightweight encoder-RNN-decoder-style network, UNISAL, and train it jointly with image and video saliency data. We evaluate our method on the video saliency datasets DHF1K, Hollywood-2 and UCF-Sports, and the image saliency datasets SALICON and MIT300. With one set of parameters, UNISAL achieves state-of-the-art performance on all video saliency datasets and is on par with the state-of-the-art for image saliency datasets, despite faster runtime and a 5 to 20-fold smaller model size compared to all competing deep methods. We provide retrospective analyses and ablation studies which confirm the importance of the domain shift modeling. The code is available at https://github.com/rdroste/unisal.

Original languageEnglish
Title of host publicationComputer Vision – ECCV 2020
Subtitle of host publication16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V
EditorsAndrea Vedaldi, Horst Bischof, Thomas Brox, Jan-Michael Frahm
PublisherSpringer
Pages419-435
Number of pages17
Edition1
ISBN (Electronic)9783030585587
ISBN (Print)9783030585570
DOIs
Publication statusPublished - 29 Oct 2020
Event16th European Conference on Computer Vision, ECCV 2020 - Glasgow, United Kingdom
Duration: 23 Aug 202028 Aug 2020

Publication series

NameLecture Notes in Computer Science
PublisherSpringer
Volume12350
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference16th European Conference on Computer Vision, ECCV 2020
Country/TerritoryUnited Kingdom
CityGlasgow
Period23/08/2028/08/20

Bibliographical note

Publisher Copyright:
© 2020, Springer Nature Switzerland AG.

Keywords

  • Domain adaptation
  • Video saliency
  • Visual saliency

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Unified Image and Video Saliency Modeling'. Together they form a unique fingerprint.

Cite this