Self-supervised Video Representation Learning by Pace Prediction

Jiangliu Wang, Jianbo Jiao, Yun Hui Liu*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

38 Citations (Scopus)

Abstract

This paper addresses the problem of self-supervised video representation learning from a new perspective – by video pace prediction. It stems from the observation that human visual system is sensitive to video pace, e.g., slow motion, a widely used technique in film making. Specifically, given a video played in natural pace, we randomly sample training clips in different paces and ask a neural network to identify the pace for each video clip. The assumption here is that the network can only succeed in such a pace reasoning task when it understands the underlying video content and learns representative spatio-temporal features. In addition, we further introduce contrastive learning to push the model towards discriminating different paces by maximizing the agreement on similar video content. To validate the effectiveness of the proposed method, we conduct extensive experiments on action recognition and video retrieval tasks with several alternative network architectures. Experimental evaluations show that our approach achieves state-of-the-art performance for self-supervised video representation learning across different network architectures and different benchmarks. The code and pre-trained models are available at https://github.com/laura-wang/video-pace.

Original languageEnglish
Title of host publicationComputer Vision – ECCV 2020 - 16th European Conference, 2020, Proceedings
EditorsAndrea Vedaldi, Horst Bischof, Thomas Brox, Jan-Michael Frahm
PublisherSpringer
Pages504-521
Number of pages18
ISBN (Print)9783030585198
DOIs
Publication statusPublished - 2020
Event16th European Conference on Computer Vision, ECCV 2020 - Glasgow, United Kingdom
Duration: 23 Aug 202028 Aug 2020

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12362 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference16th European Conference on Computer Vision, ECCV 2020
Country/TerritoryUnited Kingdom
CityGlasgow
Period23/08/2028/08/20

Bibliographical note

Funding Information:
Acknowledgements. This work was partially supported by the HK RGC TRS under T42-409/18-R, the VC Fund 4930745 of the CUHK T Stone Robotics Institute, CUHK, and the EPSRC Programme Grant Seebibyte EP/M013774/1.

Publisher Copyright:
© 2020, Springer Nature Switzerland AG.

Keywords

  • Pace
  • Self-supervised learning
  • Video representation

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Self-supervised Video Representation Learning by Pace Prediction'. Together they form a unique fingerprint.

Cite this