CL-MVSNet: Unsupervised Multi-view Stereo with Dual-level Contrastive Learning

Kaiqiang Xiong; Rui Peng; Zhe Zhang; Tianxing Feng; Jianbo Jiao; Feng Gao; Ronggang Wang

doi:10.1109/ICCV51070.2023.00349

CL-MVSNet: Unsupervised Multi-view Stereo with Dual-level Contrastive Learning

Kaiqiang Xiong, Rui Peng, Zhe Zhang, Tianxing Feng, Jianbo Jiao, Feng Gao, Ronggang Wang

Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

2 Downloads (Pure)

Abstract

Unsupervised Multi-View Stereo (MVS) methods have achieved promising progress recently. However, previous methods primarily depend on the photometric consistency assumption, which may suffer from two limitations: indistinguishable regions and view-dependent effects, e.g., low-textured areas and reflections. To address these issues, in this paper, we propose a new dual-level contrastive learning approach, named CL-MVSNet. Specifically, our model integrates two contrastive branches into an unsupervised MVS framework to construct additional supervisory signals. On the one hand, we present an image-level contrastive branch to guide the model to acquire more context awareness, thus leading to more complete depth estimation in indistinguishable regions. On the other hand, we exploit a scene-level contrastive branch to boost the representation ability, improving robustness to view-dependent effects. Moreover, to recover more accurate 3D geometry, we introduce an ℒ0.5 photometric consistency loss, which encourages the model to focus more on accurate points while mitigating the gradient penalty of undesirable ones. Extensive experiments on DTU and Tanks&Temples benchmarks demonstrate that our approach achieves state-of-the-art performance among all end-to-end unsupervised MVS frameworks and outperforms its supervised counterpart by a considerable margin without fine-tuning.

Original language	English
Title of host publication	2023 IEEE/CVF International Conference on Computer Vision (ICCV)
Publisher	IEEE
Pages	3746-3757
Number of pages	12
ISBN (Electronic)	9798350307184
ISBN (Print)	9798350307191
DOIs	https://doi.org/10.1109/ICCV51070.2023.00349
Publication status	Published - 15 Jan 2024
Event	2023 IEEE/CVF International Conference on Computer Vision (ICCV) - Paris, France Duration: 1 Oct 2023 → 6 Oct 2023

Publication series

Name	International Conference on Computer Vision (ICCV)
Publisher	IEEE
ISSN (Print)	1550-5499
ISSN (Electronic)	2380-7504

Conference

Conference	2023 IEEE/CVF International Conference on Computer Vision (ICCV)
Period	1/10/23 → 6/10/23

Keywords

Geometry
Solid modeling
Computer vision
Three-dimensional displays
Estimation
Context awareness
Benchmark testing

Access to Document

10.1109/ICCV51070.2023.00349Licence: None: All rights reserved

XiongK2024CL-MVSNet_AAM
© 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Accepted author manuscript, 16.9 MBLicence: Other (please specify with Rights Statement)

Cite this

@inproceedings{0a0981ebb0be437c810674a5160256d9,

title = "CL-MVSNet: Unsupervised Multi-view Stereo with Dual-level Contrastive Learning",

abstract = "Unsupervised Multi-View Stereo (MVS) methods have achieved promising progress recently. However, previous methods primarily depend on the photometric consistency assumption, which may suffer from two limitations: indistinguishable regions and view-dependent effects, e.g., low-textured areas and reflections. To address these issues, in this paper, we propose a new dual-level contrastive learning approach, named CL-MVSNet. Specifically, our model integrates two contrastive branches into an unsupervised MVS framework to construct additional supervisory signals. On the one hand, we present an image-level contrastive branch to guide the model to acquire more context awareness, thus leading to more complete depth estimation in indistinguishable regions. On the other hand, we exploit a scene-level contrastive branch to boost the representation ability, improving robustness to view-dependent effects. Moreover, to recover more accurate 3D geometry, we introduce an ℒ0.5 photometric consistency loss, which encourages the model to focus more on accurate points while mitigating the gradient penalty of undesirable ones. Extensive experiments on DTU and Tanks&Temples benchmarks demonstrate that our approach achieves state-of-the-art performance among all end-to-end unsupervised MVS frameworks and outperforms its supervised counterpart by a considerable margin without fine-tuning.",

keywords = "Geometry, Solid modeling, Computer vision, Three-dimensional displays, Estimation, Context awareness, Benchmark testing",

author = "Kaiqiang Xiong and Rui Peng and Zhe Zhang and Tianxing Feng and Jianbo Jiao and Feng Gao and Ronggang Wang",

year = "2024",

month = jan,

day = "15",

doi = "10.1109/ICCV51070.2023.00349",

language = "English",

isbn = "9798350307191",

series = "International Conference on Computer Vision (ICCV)",

publisher = "IEEE",

pages = "3746--3757",

booktitle = "2023 IEEE/CVF International Conference on Computer Vision (ICCV)",

note = "2023 IEEE/CVF International Conference on Computer Vision (ICCV) ; Conference date: 01-10-2023 Through 06-10-2023",

}

Xiong, K, Peng, R, Zhang, Z, Feng, T, Jiao, J, Gao, F & Wang, R 2024, CL-MVSNet: Unsupervised Multi-view Stereo with Dual-level Contrastive Learning. in 2023 IEEE/CVF International Conference on Computer Vision (ICCV)., 10376500, International Conference on Computer Vision (ICCV), IEEE, pp. 3746-3757, 2023 IEEE/CVF International Conference on Computer Vision (ICCV), 1/10/23. https://doi.org/10.1109/ICCV51070.2023.00349

CL-MVSNet: Unsupervised Multi-view Stereo with Dual-level Contrastive Learning. / Xiong, Kaiqiang; Peng, Rui; Zhang, Zhe et al.
2023 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, 2024. p. 3746-3757 10376500 (International Conference on Computer Vision (ICCV)).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

TY - GEN

T1 - CL-MVSNet

T2 - 2023 IEEE/CVF International Conference on Computer Vision (ICCV)

AU - Xiong, Kaiqiang

AU - Peng, Rui

AU - Zhang, Zhe

AU - Feng, Tianxing

AU - Jiao, Jianbo

AU - Gao, Feng

AU - Wang, Ronggang

PY - 2024/1/15

Y1 - 2024/1/15

N2 - Unsupervised Multi-View Stereo (MVS) methods have achieved promising progress recently. However, previous methods primarily depend on the photometric consistency assumption, which may suffer from two limitations: indistinguishable regions and view-dependent effects, e.g., low-textured areas and reflections. To address these issues, in this paper, we propose a new dual-level contrastive learning approach, named CL-MVSNet. Specifically, our model integrates two contrastive branches into an unsupervised MVS framework to construct additional supervisory signals. On the one hand, we present an image-level contrastive branch to guide the model to acquire more context awareness, thus leading to more complete depth estimation in indistinguishable regions. On the other hand, we exploit a scene-level contrastive branch to boost the representation ability, improving robustness to view-dependent effects. Moreover, to recover more accurate 3D geometry, we introduce an ℒ0.5 photometric consistency loss, which encourages the model to focus more on accurate points while mitigating the gradient penalty of undesirable ones. Extensive experiments on DTU and Tanks&Temples benchmarks demonstrate that our approach achieves state-of-the-art performance among all end-to-end unsupervised MVS frameworks and outperforms its supervised counterpart by a considerable margin without fine-tuning.

AB - Unsupervised Multi-View Stereo (MVS) methods have achieved promising progress recently. However, previous methods primarily depend on the photometric consistency assumption, which may suffer from two limitations: indistinguishable regions and view-dependent effects, e.g., low-textured areas and reflections. To address these issues, in this paper, we propose a new dual-level contrastive learning approach, named CL-MVSNet. Specifically, our model integrates two contrastive branches into an unsupervised MVS framework to construct additional supervisory signals. On the one hand, we present an image-level contrastive branch to guide the model to acquire more context awareness, thus leading to more complete depth estimation in indistinguishable regions. On the other hand, we exploit a scene-level contrastive branch to boost the representation ability, improving robustness to view-dependent effects. Moreover, to recover more accurate 3D geometry, we introduce an ℒ0.5 photometric consistency loss, which encourages the model to focus more on accurate points while mitigating the gradient penalty of undesirable ones. Extensive experiments on DTU and Tanks&Temples benchmarks demonstrate that our approach achieves state-of-the-art performance among all end-to-end unsupervised MVS frameworks and outperforms its supervised counterpart by a considerable margin without fine-tuning.

KW - Geometry

KW - Solid modeling

KW - Computer vision

KW - Three-dimensional displays

KW - Estimation

KW - Context awareness

KW - Benchmark testing

UR - https://jianbojiao.com/pdfs/iccv23_clmvs.pdf

U2 - 10.1109/ICCV51070.2023.00349

DO - 10.1109/ICCV51070.2023.00349

M3 - Conference contribution

SN - 9798350307191

T3 - International Conference on Computer Vision (ICCV)

SP - 3746

EP - 3757

BT - 2023 IEEE/CVF International Conference on Computer Vision (ICCV)

PB - IEEE

Y2 - 1 October 2023 through 6 October 2023

ER -

CL-MVSNet: Unsupervised Multi-view Stereo with Dual-level Contrastive Learning

Abstract

Publication series

Conference

Keywords

Access to Document

Fingerprint

Cite this