Diffuse3D: Wide-Angle 3D Photography via Bilateral Diffusion

Yutao Jiang; Yang Zhou; Yuan Liang; Wenxi Liu; Jianbo Jiao; Yuhui Quan; Shengfeng He

doi:10.1109/ICCV51070.2023.00826

Diffuse3D: Wide-Angle 3D Photography via Bilateral Diffusion

Yutao Jiang, Yang Zhou, Yuan Liang, Wenxi Liu, Jianbo Jiao, Yuhui Quan, Shengfeng He

Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Abstract

This paper aims to resolve the challenging problem of wide-angle novel view synthesis from a single image, a.k.a. wide-angle 3D photography. Existing approaches rely on local context and treat them equally to inpaint occluded RGB and depth regions, which fail to deal with large-region occlusion (i.e., observing from an extreme angle) and foreground layers might blend into background inpainting. To address the above issues, we propose Diffuse3D which employs a pre-trained diffusion model for global synthesis, while amending the model to activate depth-aware inference. Our key insight is to alter the convolution mechanism in the denoising process. We inject depth information into the denoising convolution operation with bilateral kernels, i.e., a depth kernel and a spatial kernel, to consider layered correlations among pixels. In this way, foreground regions are overlooked in background inpainting and only pixels close in depth are leveraged. On the other hand, we propose a global-local balancing approach to maximize both contextual understandings. Extensive experiments demonstrate that our approach outperforms state-of-the-art methods in novel view synthesis, especially in wide-angle scenarios. More importantly, our method does not require any training and is a plug-and-play module that can be integrated with any diffusion model. Our code can be found at https://github.com/yutaojiang1/Diffuse3D.

Original language	English
Title of host publication	2023 IEEE/CVF International Conference on Computer Vision (ICCV)
Publisher	IEEE
Pages	8964-8974
Number of pages	11
ISBN (Electronic)	9798350307184
ISBN (Print)	9798350307191
DOIs	https://doi.org/10.1109/ICCV51070.2023.00826
Publication status	Published - 15 Jan 2024
Event	2023 IEEE/CVF International Conference on Computer Vision (ICCV) - Paris, France Duration: 1 Oct 2023 → 6 Oct 2023

Publication series

Name	International Conference on Computer Vision (ICCV)
Publisher	IEEE
ISSN (Print)	1550-5499
ISSN (Electronic)	2380-7504

Conference

Conference	2023 IEEE/CVF International Conference on Computer Vision (ICCV)
Period	1/10/23 → 6/10/23

Keywords

Photography
Training
Computer vision
Three-dimensional displays
Image resolution
Correlation
Convolution

Access to Document

10.1109/ICCV51070.2023.00826Licence: None: All rights reserved

JaingY2024Diffuse3D_AAM
© 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Accepted author manuscript, 3.43 MBLicence: Other (please specify with Rights Statement)

Cite this

@inproceedings{ce8854dc0b27436e9c008cb137703485,

title = "Diffuse3D: Wide-Angle 3D Photography via Bilateral Diffusion",

abstract = "This paper aims to resolve the challenging problem of wide-angle novel view synthesis from a single image, a.k.a. wide-angle 3D photography. Existing approaches rely on local context and treat them equally to inpaint occluded RGB and depth regions, which fail to deal with large-region occlusion (i.e., observing from an extreme angle) and foreground layers might blend into background inpainting. To address the above issues, we propose Diffuse3D which employs a pre-trained diffusion model for global synthesis, while amending the model to activate depth-aware inference. Our key insight is to alter the convolution mechanism in the denoising process. We inject depth information into the denoising convolution operation with bilateral kernels, i.e., a depth kernel and a spatial kernel, to consider layered correlations among pixels. In this way, foreground regions are overlooked in background inpainting and only pixels close in depth are leveraged. On the other hand, we propose a global-local balancing approach to maximize both contextual understandings. Extensive experiments demonstrate that our approach outperforms state-of-the-art methods in novel view synthesis, especially in wide-angle scenarios. More importantly, our method does not require any training and is a plug-and-play module that can be integrated with any diffusion model. Our code can be found at https://github.com/yutaojiang1/Diffuse3D.",

keywords = "Photography, Training, Computer vision, Three-dimensional displays, Image resolution, Correlation, Convolution",

author = "Yutao Jiang and Yang Zhou and Yuan Liang and Wenxi Liu and Jianbo Jiao and Yuhui Quan and Shengfeng He",

year = "2024",

month = jan,

day = "15",

doi = "10.1109/ICCV51070.2023.00826",

language = "English",

isbn = "9798350307191",

series = "International Conference on Computer Vision (ICCV)",

publisher = "IEEE",

pages = "8964--8974",

booktitle = "2023 IEEE/CVF International Conference on Computer Vision (ICCV)",

note = "2023 IEEE/CVF International Conference on Computer Vision (ICCV) ; Conference date: 01-10-2023 Through 06-10-2023",

}

Jiang, Y, Zhou, Y, Liang, Y, Liu, W, Jiao, J, Quan, Y & He, S 2024, Diffuse3D: Wide-Angle 3D Photography via Bilateral Diffusion. in 2023 IEEE/CVF International Conference on Computer Vision (ICCV)., 10376874, International Conference on Computer Vision (ICCV), IEEE, pp. 8964-8974, 2023 IEEE/CVF International Conference on Computer Vision (ICCV), 1/10/23. https://doi.org/10.1109/ICCV51070.2023.00826

TY - GEN

T1 - Diffuse3D

T2 - 2023 IEEE/CVF International Conference on Computer Vision (ICCV)

AU - Jiang, Yutao

AU - Zhou, Yang

AU - Liang, Yuan

AU - Liu, Wenxi

AU - Jiao, Jianbo

AU - Quan, Yuhui

AU - He, Shengfeng

PY - 2024/1/15

Y1 - 2024/1/15

N2 - This paper aims to resolve the challenging problem of wide-angle novel view synthesis from a single image, a.k.a. wide-angle 3D photography. Existing approaches rely on local context and treat them equally to inpaint occluded RGB and depth regions, which fail to deal with large-region occlusion (i.e., observing from an extreme angle) and foreground layers might blend into background inpainting. To address the above issues, we propose Diffuse3D which employs a pre-trained diffusion model for global synthesis, while amending the model to activate depth-aware inference. Our key insight is to alter the convolution mechanism in the denoising process. We inject depth information into the denoising convolution operation with bilateral kernels, i.e., a depth kernel and a spatial kernel, to consider layered correlations among pixels. In this way, foreground regions are overlooked in background inpainting and only pixels close in depth are leveraged. On the other hand, we propose a global-local balancing approach to maximize both contextual understandings. Extensive experiments demonstrate that our approach outperforms state-of-the-art methods in novel view synthesis, especially in wide-angle scenarios. More importantly, our method does not require any training and is a plug-and-play module that can be integrated with any diffusion model. Our code can be found at https://github.com/yutaojiang1/Diffuse3D.

AB - This paper aims to resolve the challenging problem of wide-angle novel view synthesis from a single image, a.k.a. wide-angle 3D photography. Existing approaches rely on local context and treat them equally to inpaint occluded RGB and depth regions, which fail to deal with large-region occlusion (i.e., observing from an extreme angle) and foreground layers might blend into background inpainting. To address the above issues, we propose Diffuse3D which employs a pre-trained diffusion model for global synthesis, while amending the model to activate depth-aware inference. Our key insight is to alter the convolution mechanism in the denoising process. We inject depth information into the denoising convolution operation with bilateral kernels, i.e., a depth kernel and a spatial kernel, to consider layered correlations among pixels. In this way, foreground regions are overlooked in background inpainting and only pixels close in depth are leveraged. On the other hand, we propose a global-local balancing approach to maximize both contextual understandings. Extensive experiments demonstrate that our approach outperforms state-of-the-art methods in novel view synthesis, especially in wide-angle scenarios. More importantly, our method does not require any training and is a plug-and-play module that can be integrated with any diffusion model. Our code can be found at https://github.com/yutaojiang1/Diffuse3D.

KW - Photography

KW - Training

KW - Computer vision

KW - Three-dimensional displays

KW - Image resolution

KW - Correlation

KW - Convolution

UR - https://jianbojiao.com/pdfs/iccv23_diffuse3d.pdf

U2 - 10.1109/ICCV51070.2023.00826

DO - 10.1109/ICCV51070.2023.00826

M3 - Conference contribution

SN - 9798350307191

T3 - International Conference on Computer Vision (ICCV)

SP - 8964

EP - 8974

BT - 2023 IEEE/CVF International Conference on Computer Vision (ICCV)

PB - IEEE

Y2 - 1 October 2023 through 6 October 2023

ER -

Diffuse3D: Wide-Angle 3D Photography via Bilateral Diffusion

Abstract

Publication series

Conference

Keywords

Access to Document

Fingerprint

Cite this