Abstract
Recent work in Video Frame Interpolation (VFI) tries to formulate VFI as a diffusion-based conditional image generation problem, synthesizing the intermediate frame given a random noise and neighboring frames. Due to the relatively high resolution of videos, Latent Diffusion Models (LDMs) are employed to run diffusion models in latent space efficiently. Such a formulation poses a crucial challenge: VFI expects that the output is deterministically equal to the ground truth intermediate frame, but LDMs randomly generate a diverse set of different images when the model runs multiple times. The diversity is due to the large cumulative variance (variance accumulated at each generation step) of generated latent representations in LDMs, making the sampling trajectory random. To address this problem, we propose our unique solution: Frame Interpolation with Consecutive Brownian Bridge Diffusion. Specifically, we propose consecutive Brownian Bridge diffusion that takes a deterministic initial value as input, resulting in a much smaller cumulative variance of generated latent representations. Our experiments suggest that our method can improve together with the improvement of the autoencoder and achieve state-of-the-art performance in VFI, leaving strong potential for further enhancement. Our code is available at https://github.com/ZonglinL/ConsecutiveBrownianBridge.
| Original language | English |
|---|---|
| Title of host publication | MM '24 |
| Subtitle of host publication | Proceedings of the 32nd ACM International Conference on Multimedia |
| Place of Publication | New York |
| Publisher | Association for Computing Machinery (ACM) |
| Pages | 3449-3458 |
| Number of pages | 10 |
| ISBN (Electronic) | 9798400706868 |
| DOIs | |
| Publication status | Published - 28 Oct 2024 |
| Event | 32nd ACM International Conference on Multimedia - Melbourne, Australia Duration: 28 Oct 2024 → 1 Nov 2024 https://2024.acmmm.org/ |
Conference
| Conference | 32nd ACM International Conference on Multimedia |
|---|---|
| Abbreviated title | MM '24 |
| Country/Territory | Australia |
| City | Melbourne |
| Period | 28/10/24 → 1/11/24 |
| Internet address |
Keywords
- brownian bridge
- diffusion models
- video frame interpolation
ASJC Scopus subject areas
- Artificial Intelligence
- Computer Graphics and Computer-Aided Design
- Human-Computer Interaction
- Software