TY - GEN
T1 - CL-MVSNet
T2 - 2023 IEEE/CVF International Conference on Computer Vision (ICCV)
AU - Xiong, Kaiqiang
AU - Peng, Rui
AU - Zhang, Zhe
AU - Feng, Tianxing
AU - Jiao, Jianbo
AU - Gao, Feng
AU - Wang, Ronggang
PY - 2024/1/15
Y1 - 2024/1/15
N2 - Unsupervised Multi-View Stereo (MVS) methods have achieved promising progress recently. However, previous methods primarily depend on the photometric consistency assumption, which may suffer from two limitations: indistinguishable regions and view-dependent effects, e.g., low-textured areas and reflections. To address these issues, in this paper, we propose a new dual-level contrastive learning approach, named CL-MVSNet. Specifically, our model integrates two contrastive branches into an unsupervised MVS framework to construct additional supervisory signals. On the one hand, we present an image-level contrastive branch to guide the model to acquire more context awareness, thus leading to more complete depth estimation in indistinguishable regions. On the other hand, we exploit a scene-level contrastive branch to boost the representation ability, improving robustness to view-dependent effects. Moreover, to recover more accurate 3D geometry, we introduce an ℒ0.5 photometric consistency loss, which encourages the model to focus more on accurate points while mitigating the gradient penalty of undesirable ones. Extensive experiments on DTU and Tanks&Temples benchmarks demonstrate that our approach achieves state-of-the-art performance among all end-to-end unsupervised MVS frameworks and outperforms its supervised counterpart by a considerable margin without fine-tuning.
AB - Unsupervised Multi-View Stereo (MVS) methods have achieved promising progress recently. However, previous methods primarily depend on the photometric consistency assumption, which may suffer from two limitations: indistinguishable regions and view-dependent effects, e.g., low-textured areas and reflections. To address these issues, in this paper, we propose a new dual-level contrastive learning approach, named CL-MVSNet. Specifically, our model integrates two contrastive branches into an unsupervised MVS framework to construct additional supervisory signals. On the one hand, we present an image-level contrastive branch to guide the model to acquire more context awareness, thus leading to more complete depth estimation in indistinguishable regions. On the other hand, we exploit a scene-level contrastive branch to boost the representation ability, improving robustness to view-dependent effects. Moreover, to recover more accurate 3D geometry, we introduce an ℒ0.5 photometric consistency loss, which encourages the model to focus more on accurate points while mitigating the gradient penalty of undesirable ones. Extensive experiments on DTU and Tanks&Temples benchmarks demonstrate that our approach achieves state-of-the-art performance among all end-to-end unsupervised MVS frameworks and outperforms its supervised counterpart by a considerable margin without fine-tuning.
KW - Geometry
KW - Solid modeling
KW - Computer vision
KW - Three-dimensional displays
KW - Estimation
KW - Context awareness
KW - Benchmark testing
UR - https://jianbojiao.com/pdfs/iccv23_clmvs.pdf
U2 - 10.1109/ICCV51070.2023.00349
DO - 10.1109/ICCV51070.2023.00349
M3 - Conference contribution
SN - 9798350307191
T3 - International Conference on Computer Vision (ICCV)
SP - 3746
EP - 3757
BT - 2023 IEEE/CVF International Conference on Computer Vision (ICCV)
PB - IEEE
Y2 - 1 October 2023 through 6 October 2023
ER -