-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.html
197 lines (186 loc) · 9.16 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="description" content="DESCRIPTION META TAG">
<meta property="og:title" content="SOCIAL MEDIA TITLE TAG"/>
<meta property="og:description" content="SOCIAL MEDIA DESCRIPTION TAG"/>
<meta property="og:url" content="URL OF THE WEBSITE"/>
<meta property="og:image" content="static/image/your_banner_image.png" />
<meta property="og:image:width" content="1200"/>
<meta property="og:image:height" content="630"/>
<meta name="twitter:title" content="TWITTER BANNER TITLE META TAG">
<meta name="twitter:description" content="TWITTER BANNER DESCRIPTION META TAG">
<meta name="twitter:image" content="static/images/your_twitter_banner_image.png">
<meta name="twitter:card" content="summary_large_image">
<meta name="keywords" content="KEYWORDS SHOULD BE PLACED HERE">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>InstanceGaussian</title>
<link rel="icon" type="image/x-icon" href="static/images/favicon.ico">
<link href="https://fonts.googleapis.com/css?family=Google+Sans|Noto+Sans|Castoro" rel="stylesheet">
<link rel="stylesheet" href="static/css/bulma.min.css">
<link rel="stylesheet" href="static/css/bulma-carousel.min.css">
<link rel="stylesheet" href="static/css/bulma-slider.min.css">
<link rel="stylesheet" href="static/css/fontawesome.all.min.css">
<link rel="stylesheet" href="https://cdn.jsdelivr.net/gh/jpswalsh/academicons@1/css/academicons.min.css">
<link rel="stylesheet" href="static/css/index.css">
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>
<script src="https://documentcloud.adobe.com/view-sdk/main.js"></script>
<script defer src="static/js/fontawesome.all.min.js"></script>
<script src="static/js/bulma-carousel.min.js"></script>
<script src="static/js/bulma-slider.min.js"></script>
<script src="static/js/index.js"></script>
<style>
body {
font-family: "Noto Sans", sans-serif;
}
.title, .subtitle {
text-align: center;
}
figure img {
max-width: 100%;
height: auto;
margin: 20px 0;
border: 1px solid #ccc;
border-radius: 10px;
}
figure figcaption {
text-align: center;
font-size: 0.9rem;
color: gray;
margin-top: 5px;
margin-bottom: 25px;
}
.content p {
width: 80%;
line-height: 1.6;
margin: 0 auto;
}
footer {
text-align: center;
padding: 20px;
background-color: #f9f9f9;
font-size: 0.8rem;
}
</style>
</head>
<body>
<section class="hero">
<div class="hero-body">
<div class="container">
<div class="columns is-centered">
<div class="column is-12 has-text-centered">
<h1 class="title is-2">InstanceGaussian: Appearance-Semantic Joint Gaussian Representation for 3D Instance-Level Perception</h1>
<div class="is-size-5">
<p>
<a href="https://villa.jianzhang.tech/people/haijie-li-%E6%9D%8E%E6%B5%B7%E6%9D%B0/" target="_blank">Haijie Li</a><sup>1</sup>,
<a href="https://yanmin-wu.github.io/" target="_blank">Yanmin Wu</a><sup>1</sup>,
Jiarui Meng</a><sup>1</sup>,
<a href="https://villa.jianzhang.tech/people/qiankun-gao-%E9%AB%98%E4%B9%BE%E5%9D%A4/" target="_blank">Qiankun Gao</a><sup>1</sup>,
Zhiyao Zhang</a><sup>2</sup>,
Ronggang Wang</a><sup>1</sup>,
<a href="https://jianzhang.tech/" target="_blank">Jian Zhang</a><sup>1</sup>
</p>
</p>
<p></p>
<p>
<sup>1</sup>Peking University, <sup>2</sup>Northeastern University
</p>
</div>
<div class="buttons is-centered">
<a href="https://arxiv.org/pdf/2411.19235" target="_blank" class="button is-dark is-rounded">
<span class="icon"><i class="fas fa-file-pdf"></i></span>
<span>Paper</span>
</a>
<a href="https://arxiv.org/abs/2411.19235" target="_blank" class="button is-dark is-rounded">
<span class="icon"><i class="ai ai-arxiv"></i></span>
<span>arXiv</span>
</a>
</div>
</div>
</div>
</div>
</div>
</section>
<section class="section">
<div class="container">
<h2 class="title is-3">Framework</h2>
<figure>
<img src="static/images/framework_00.png" alt="Framework image" style="width: 80%;display: block; margin: auto;">
<figcaption>
Top: Appearance-semantic joint Gaussian representation avoids the imbalance and inconsistency in appearance-semantic learning.
<p></p>
Bottom: Bottom-up instantiation: Over-segmentation is achieved via FPS sampling and clustering, followed by instantiation through graph-connectivity-based aggregation.
</figcaption>
</figure>
</div>
</section>
<section class="section hero is-light">
<div class="container">
<h2 class="title is-3">Abstract</h2>
<div class="content">
<p>
3D scene understanding has become an essential area of research with applications in autonomous driving, robotics, and augmented reality. Recently, 3D Gaussian Splatting (3DGS) has emerged as a powerful approach, combining explicit modeling with neural adaptability to provide efficient and detailed scene representations.
</p>
<p>
However, three major challenges remain in leveraging 3DGS for scene understanding:
<strong>1)</strong> an imbalance between appearance and semantics, where dense Gaussian usage for fine-grained texture modeling does not align with the minimal requirements for semantic attributes;
<strong>2)</strong> inconsistencies between appearance and semantics, as purely appearance-based Gaussians often misrepresent object boundaries; and
<strong>3)</strong> reliance on top-down instance segmentation methods, which struggle with uneven category distributions, leading to over- or under-segmentation.
</p>
<p>
In this work, we propose <strong>InstanceGaussian</strong>, a method that jointly learns appearance and semantic features while adaptively aggregating instances. Our contributions include:
<strong>i)</strong> a novel Semantic-Scaffold-GS representation balancing appearance and semantics to improve feature representations and boundary delineation;
<strong>ii)</strong> a progressive appearance-semantic joint training strategy to enhance stability and segmentation accuracy; and
<strong>iii)</strong> a bottom-up, category-agnostic instance aggregation approach that addresses segmentation challenges through farthest point sampling and connected component analysis. Our approach achieves state-of-the-art performance in category-agnostic, open-vocabulary 3D point-level segmentation, highlighting the effectiveness of the proposed representation and training strategies.
</p>
</div>
</div>
</section>
<section class="section">
<div class="container">
<h2 class="title is-4">Results</h2>
<figure>
<img src="static/images/instance_00.png" alt="Instance segmentation result" style="width: 80%;display: block; margin: auto;">
<figcaption>Visualization comparison of category-agnostic 3D instance segmentation result.</figcaption>
</figure>
<figure>
<img src="static/images/openv_00.png" alt="Open vocabulary results" style="width: 80%;display: block; margin: auto;">
<figcaption>Open-vocabulary query point cloud understanding on Scannet dataset.</figcaption>
</figure>
<figure>
<img src="static/images/lerf_00.png" alt="Open vocabulary results" style="width: 80%;display: block; margin: auto;">
<figcaption>Open-vocabulary 3D object selection and rendering on the LeRF dataset.</figcaption>
</figure>
<figure>
<img src="static/images/grasp_00.png" alt="Open vocabulary results" style="width: 80%;display: block; margin: auto;">
<figcaption><p>Top: Reference image of scenes. Middle: Constructed 3D Gaussians/points. </p>
<p>Bottom: The visualization result of category-agnostic 3D instance segmentation in GraspNet dataset.</p></figcaption>
</figure>
</div>
</section>
<section class="section" id="BibTeX">
<div class="container content">
<h2 class="title">BibTeX</h2>
<pre><code>
@misc{li2024instancegaussianappearancesemanticjointgaussian,
title={InstanceGaussian: Appearance-Semantic Joint Gaussian Representation for 3D Instance-Level Perception},
author={Haijie Li and Yanmin Wu and Jiarui Meng and Qiankun Gao and Zhiyao Zhang and Ronggang Wang and Jian Zhang},
year={2024},
eprint={2411.19235},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2411.19235},
}
</code></pre>
</div>
</section>
<footer class="footer">
<div class="container">
<p>
This page was built using the <a href="https://github.com/eliahuhorwitz/Academic-project-page-template" target="_blank">Academic Project Page Template</a>. Licensed under <a href="http://creativecommons.org/licenses/by-sa/4.0/" target="_blank">CC BY-SA 4.0</a>.
</p>
</div>
</footer>
</body>
</html>