Merge branch 'main' of github.com:3dlg-hcvc/DuoduoCLIP into main

3dlg-hcvc · Nov 28, 2024 · 9c5322b · 9c5322b
2 parents 9c0aa18 + 30b3ca5
commit 9c5322b
Show file tree

Hide file tree

Showing 5 changed files with 10 additions and 8 deletions.
diff --git a/docs/assets/images/model-attn.png b/docs/assets/images/model-attn.png
diff --git a/docs/assets/images/model-param.png b/docs/assets/images/model-param.png
diff --git a/docs/assets/images/objaverse-acc.png b/docs/assets/images/objaverse-acc.png
diff --git a/docs/feed.xml b/docs/feed.xml
@@ -1 +1 @@
-<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.3.2">Jekyll</generator><link href="/feed.xml" rel="self" type="application/atom+xml" /><link href="/" rel="alternate" type="text/html" /><updated>2024-06-18T10:12:45+08:00</updated><id>/feed.xml</id><title type="html">DuoduoCLIP</title><subtitle></subtitle><author><name>default</name><email>default</email></author></feed>
+<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.3.2">Jekyll</generator><link href="/feed.xml" rel="self" type="application/atom+xml" /><link href="/" rel="alternate" type="text/html" /><updated>2024-10-17T14:51:01-07:00</updated><id>/feed.xml</id><title type="html">DuoduoCLIP</title><subtitle></subtitle><author><name>default</name><email>default</email></author></feed>
diff --git a/docs/index.html b/docs/index.html
@@ -46,17 +46,19 @@ <h2 id="abstract">Abstract</h2>
   <p>We introduce Duoduo CLIP, a model for 3D representation learning that learns shape encodings from multi-view images instead of point-clouds. 
 The choice of multi-view images allows us to leverage 2D priors from off-the-shelf CLIP models to facilitate fine-tuning with 3D data. 
 Our approach not only shows better generalization compared to existing point cloud methods, but also reduces GPU requirements and training time. 
-In addition, we modify the model with cross-view attention to leverage information across multiple frames of the object which further boosts performance. 
-Compared to the current SOTA point cloud method that requires 480 A100 hours to train 1 billion model parameters we only require 57 A5000 hours and 87 million parameters.
-Multi-view images also provide more flexibility in use cases compared to point clouds.
-This includes being able to encode objects with a variable number of images, with better performance when more views are used.
-This is in contrast to point cloud based methods, where an entire scan or model of an object is required.
-We showcase this flexibility with object retrieval from images of real-world objects. Our model also achieves better performance in more fine-grained text to shape retrieval, demonstrating better text-and-shape alignment than point cloud based models.</p>
+In addition, the model is modified with cross-view attention to leverage information across multiple frames of the object which further boosts performance. 
+Notably, our model is permutation invariant to the order of multi-view images while being pose-free. 
+Compared to the current SOTA point cloud method that requires 480 A100 hours to train 1 billion model parameters we only require 57 A5000 hours and 87 million parameters. 
+Multi-view images also provide more flexibility including being able to encode objects with a variable number of images, and performance scales when more views are used. 
+In contrast, point cloud based methods require an entire scan or model of the object. 
+We showcase this flexibility with benchmarks from images of real-world objects.
+Our model also achieves better performance in more fine-grained text to shape retrieval, demonstrating better text-and-shape alignment than point cloud based models.</p>
 </blockquote>
 
 <h2 id="model">Model</h2>
 
-<p><img src="./assets/images/model.png" alt="alt text" /></p>
+<p><img src="./assets/images/model.png" alt="alt text" />
+<img src="./assets/images/model-attn.png" alt="alt text" /></p>
 
 <h2 id="synthetic-retrieval">Synthetic Retrieval</h2>
Original file line number	Diff line number	Diff line change
		@@ -1 +1 @@
		<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.3.2">Jekyll</generator><link href="/feed.xml" rel="self" type="application/atom+xml" /><link href="/" rel="alternate" type="text/html" /><updated>2024-06-18T10:12:45+08:00</updated><id>/feed.xml</id><title type="html">DuoduoCLIP</title><subtitle></subtitle><author><name>default</name><email>default</email></author></feed>
		<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.3.2">Jekyll</generator><link href="/feed.xml" rel="self" type="application/atom+xml" /><link href="/" rel="alternate" type="text/html" /><updated>2024-10-17T14:51:01-07:00</updated><id>/feed.xml</id><title type="html">DuoduoCLIP</title><subtitle></subtitle><author><name>default</name><email>default</email></author></feed>