Surface-SOS: Self-Supervised Object Segmentation via Neural Surface Representation

Xiaoyun Zheng; Liwei Liao; Jianbo Jiao; Feng Gao; Ronggang Wang

doi:10.1109/TIP.2024.3374199

Surface-SOS: Self-Supervised Object Segmentation via Neural Surface Representation

Xiaoyun Zheng, Liwei Liao, Jianbo Jiao, Feng Gao, Ronggang Wang

Computer Science

Research output: Contribution to journal › Article › peer-review

1 Downloads (Pure)

Abstract

Self-supervised Object Segmentation (SOS) aims to segment objects without any annotations. Under conditions of multi-camera inputs, the structural, textural and geometrical consistency among each view can be leveraged to achieve fine-grained object segmentation. To make better use of the above information, we propose Surface representation based Self-supervised Object Segmentation (Surface-SOS), a new framework to segment objects for each view by 3D surface representation from multi-view images of a scene. To model high-quality geometry surfaces for complex scenes, we design a novel scene representation scheme, which decomposes the scene into two complementary neural representation modules respectively with a Signed Distance Function (SDF). Moreover, Surface-SOS is able to refine single-view segmentation with multi-view unlabeled images, by introducing coarse segmentation masks as additional input. To the best of our knowledge, Surface-SOS is the first self-supervised approach that leverages neural surface representation to break the dependence on large amounts of annotated data and strong constraints. These constraints typically involve observing target objects against a static background or relying on temporal supervision in videos. Extensive experiments on standard benchmarks including LLFF, CO3D, BlendedMVS, TUM and several real-world scenes show that Surface-SOS always yields finer object masks than its NeRF-based counterparts and surpasses supervised single-view baselines remarkably.

Original language	English
Pages (from-to)	2018 - 2031
Journal	IEEE Transactions on Image Processing
Volume	33
DOIs	https://doi.org/10.1109/TIP.2024.3374199
Publication status	Published - 12 Mar 2024

Bibliographical note

Funding Agency:
Outstanding Talents Training Fund in Shenzhen
10.13039/501100017610-Shenzhen Science and Technology Program-Shenzhen Cultivation of Excellent Scientific and Technological Innovation Talents Project (Grant Number: RCJC20200714114435057)
10.13039/501100017610-Shenzhen Science and Technology Program-Shenzhen Hong Kong Joint (Grant Number: SGDX20211123144400001)
10.13039/501100001809-National Natural Science Foundation of China (Grant Number: U21B2012)
10.13039/501100007937-Migu Cultural Technology Company Ltd., (Migu)-Peking University Meta Vision Technology Innovation Laboratory
10.13039/501100000288-Royal Society (Grant Number: IES\R3\223050 and SIF\R1\231009)

Keywords

multi-view object segmentation
neural surface representation
Self-supervised learning

ASJC Scopus subject areas

Software
Computer Graphics and Computer-Aided Design

Access to Document

10.1109/TIP.2024.3374199Licence: None: All rights reserved

ZhenghX2024Surface-SOS_AAM
© 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Accepted author manuscript, 14.5 MBLicence: Other (please specify with Rights Statement)

Cite this

@article{1f3296cb39204349a0c974bb9f1fee24,

title = "Surface-SOS: Self-Supervised Object Segmentation via Neural Surface Representation",

abstract = "Self-supervised Object Segmentation (SOS) aims to segment objects without any annotations. Under conditions of multi-camera inputs, the structural, textural and geometrical consistency among each view can be leveraged to achieve fine-grained object segmentation. To make better use of the above information, we propose Surface representation based Self-supervised Object Segmentation (Surface-SOS), a new framework to segment objects for each view by 3D surface representation from multi-view images of a scene. To model high-quality geometry surfaces for complex scenes, we design a novel scene representation scheme, which decomposes the scene into two complementary neural representation modules respectively with a Signed Distance Function (SDF). Moreover, Surface-SOS is able to refine single-view segmentation with multi-view unlabeled images, by introducing coarse segmentation masks as additional input. To the best of our knowledge, Surface-SOS is the first self-supervised approach that leverages neural surface representation to break the dependence on large amounts of annotated data and strong constraints. These constraints typically involve observing target objects against a static background or relying on temporal supervision in videos. Extensive experiments on standard benchmarks including LLFF, CO3D, BlendedMVS, TUM and several real-world scenes show that Surface-SOS always yields finer object masks than its NeRF-based counterparts and surpasses supervised single-view baselines remarkably.",

keywords = "multi-view object segmentation, neural surface representation, Self-supervised learning",

author = "Xiaoyun Zheng and Liwei Liao and Jianbo Jiao and Feng Gao and Ronggang Wang",

note = "Funding Agency: Outstanding Talents Training Fund in Shenzhen 10.13039/501100017610-Shenzhen Science and Technology Program-Shenzhen Cultivation of Excellent Scientific and Technological Innovation Talents Project (Grant Number: RCJC20200714114435057) 10.13039/501100017610-Shenzhen Science and Technology Program-Shenzhen Hong Kong Joint (Grant Number: SGDX20211123144400001) 10.13039/501100001809-National Natural Science Foundation of China (Grant Number: U21B2012) 10.13039/501100007937-Migu Cultural Technology Company Ltd., (Migu)-Peking University Meta Vision Technology Innovation Laboratory 10.13039/501100000288-Royal Society (Grant Number: IES\R3\223050 and SIF\R1\231009)",

year = "2024",

month = mar,

day = "12",

doi = "10.1109/TIP.2024.3374199",

language = "English",

volume = "33",

pages = "2018 -- 2031",

journal = "IEEE Transactions on Image Processing",

issn = "1057-7149",

publisher = "Institute of Electrical and Electronics Engineers (IEEE)",

}

TY - JOUR

T1 - Surface-SOS

T2 - Self-Supervised Object Segmentation via Neural Surface Representation

AU - Zheng, Xiaoyun

AU - Liao, Liwei

AU - Jiao, Jianbo

AU - Gao, Feng

AU - Wang, Ronggang

N1 - Funding Agency: Outstanding Talents Training Fund in Shenzhen 10.13039/501100017610-Shenzhen Science and Technology Program-Shenzhen Cultivation of Excellent Scientific and Technological Innovation Talents Project (Grant Number: RCJC20200714114435057) 10.13039/501100017610-Shenzhen Science and Technology Program-Shenzhen Hong Kong Joint (Grant Number: SGDX20211123144400001) 10.13039/501100001809-National Natural Science Foundation of China (Grant Number: U21B2012) 10.13039/501100007937-Migu Cultural Technology Company Ltd., (Migu)-Peking University Meta Vision Technology Innovation Laboratory 10.13039/501100000288-Royal Society (Grant Number: IES\R3\223050 and SIF\R1\231009)

PY - 2024/3/12

Y1 - 2024/3/12

N2 - Self-supervised Object Segmentation (SOS) aims to segment objects without any annotations. Under conditions of multi-camera inputs, the structural, textural and geometrical consistency among each view can be leveraged to achieve fine-grained object segmentation. To make better use of the above information, we propose Surface representation based Self-supervised Object Segmentation (Surface-SOS), a new framework to segment objects for each view by 3D surface representation from multi-view images of a scene. To model high-quality geometry surfaces for complex scenes, we design a novel scene representation scheme, which decomposes the scene into two complementary neural representation modules respectively with a Signed Distance Function (SDF). Moreover, Surface-SOS is able to refine single-view segmentation with multi-view unlabeled images, by introducing coarse segmentation masks as additional input. To the best of our knowledge, Surface-SOS is the first self-supervised approach that leverages neural surface representation to break the dependence on large amounts of annotated data and strong constraints. These constraints typically involve observing target objects against a static background or relying on temporal supervision in videos. Extensive experiments on standard benchmarks including LLFF, CO3D, BlendedMVS, TUM and several real-world scenes show that Surface-SOS always yields finer object masks than its NeRF-based counterparts and surpasses supervised single-view baselines remarkably.

AB - Self-supervised Object Segmentation (SOS) aims to segment objects without any annotations. Under conditions of multi-camera inputs, the structural, textural and geometrical consistency among each view can be leveraged to achieve fine-grained object segmentation. To make better use of the above information, we propose Surface representation based Self-supervised Object Segmentation (Surface-SOS), a new framework to segment objects for each view by 3D surface representation from multi-view images of a scene. To model high-quality geometry surfaces for complex scenes, we design a novel scene representation scheme, which decomposes the scene into two complementary neural representation modules respectively with a Signed Distance Function (SDF). Moreover, Surface-SOS is able to refine single-view segmentation with multi-view unlabeled images, by introducing coarse segmentation masks as additional input. To the best of our knowledge, Surface-SOS is the first self-supervised approach that leverages neural surface representation to break the dependence on large amounts of annotated data and strong constraints. These constraints typically involve observing target objects against a static background or relying on temporal supervision in videos. Extensive experiments on standard benchmarks including LLFF, CO3D, BlendedMVS, TUM and several real-world scenes show that Surface-SOS always yields finer object masks than its NeRF-based counterparts and surpasses supervised single-view baselines remarkably.

KW - multi-view object segmentation

KW - neural surface representation

KW - Self-supervised learning

UR - http://www.scopus.com/inward/record.url?scp=85187991186&partnerID=8YFLogxK

U2 - 10.1109/TIP.2024.3374199

DO - 10.1109/TIP.2024.3374199

M3 - Article

C2 - 38470593

AN - SCOPUS:85187991186

SN - 1057-7149

VL - 33

SP - 2018

EP - 2031

JO - IEEE Transactions on Image Processing

JF - IEEE Transactions on Image Processing

ER -

Surface-SOS: Self-Supervised Object Segmentation via Neural Surface Representation

Abstract

Bibliographical note

Keywords

ASJC Scopus subject areas

Access to Document

Fingerprint

Cite this