Upload image Post API done

libai-lab · Oct 23, 2023 · 3376423 · 3376423
commit 3376423
Show file tree

Hide file tree

Showing 46 changed files with 6,317 additions and 0 deletions.
diff --git a/.DS_Store b/.DS_Store
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2023 dreamgaussian
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/LICENSE_GAUSSIAN_SPLATTING.md b/LICENSE_GAUSSIAN_SPLATTING.md
@@ -0,0 +1,83 @@
+Gaussian-Splatting License  
+===========================  
+
+**Inria** and **the Max Planck Institut for Informatik (MPII)** hold all the ownership rights on the *Software* named **gaussian-splatting**.  
+The *Software* is in the process of being registered with the Agence pour la Protection des  
+Programmes (APP).  
+
+The *Software* is still being developed by the *Licensor*.  
+
+*Licensor*'s goal is to allow the research community to use, test and evaluate  
+the *Software*.  
+
+## 1.  Definitions  
+
+*Licensee* means any person or entity that uses the *Software* and distributes  
+its *Work*.  
+
+*Licensor* means the owners of the *Software*, i.e Inria and MPII  
+
+*Software* means the original work of authorship made available under this  
+License ie gaussian-splatting.  
+
+*Work* means the *Software* and any additions to or derivative works of the  
+*Software* that are made available under this License.  
+
+
+## 2.  Purpose  
+This license is intended to define the rights granted to the *Licensee* by  
+Licensors under the *Software*.  
+
+## 3.  Rights granted  
+
+For the above reasons Licensors have decided to distribute the *Software*.  
+Licensors grant non-exclusive rights to use the *Software* for research purposes  
+to research users (both academic and industrial), free of charge, without right  
+to sublicense.. The *Software* may be used "non-commercially", i.e., for research  
+and/or evaluation purposes only.  
+
+Subject to the terms and conditions of this License, you are granted a  
+non-exclusive, royalty-free, license to reproduce, prepare derivative works of,  
+publicly display, publicly perform and distribute its *Work* and any resulting  
+derivative works in any form.  
+
+## 4.  Limitations  
+
+**4.1 Redistribution.** You may reproduce or distribute the *Work* only if (a) you do  
+so under this License, (b) you include a complete copy of this License with  
+your distribution, and (c) you retain without modification any copyright,  
+patent, trademark, or attribution notices that are present in the *Work*.  
+
+**4.2 Derivative Works.** You may specify that additional or different terms apply  
+to the use, reproduction, and distribution of your derivative works of the *Work*  
+("Your Terms") only if (a) Your Terms provide that the use limitation in  
+Section 2 applies to your derivative works, and (b) you identify the specific  
+derivative works that are subject to Your Terms. Notwithstanding Your Terms,  
+this License (including the redistribution requirements in Section 3.1) will  
+continue to apply to the *Work* itself.  
+
+**4.3** Any other use without of prior consent of Licensors is prohibited. Research  
+users explicitly acknowledge having received from Licensors all information  
+allowing to appreciate the adequacy between of the *Software* and their needs and  
+to undertake all necessary precautions for its execution and use.  
+
+**4.4** The *Software* is provided both as a compiled library file and as source  
+code. In case of using the *Software* for a publication or other results obtained  
+through the use of the *Software*, users are strongly encouraged to cite the  
+corresponding publications as explained in the documentation of the *Software*.  
+
+## 5.  Disclaimer  
+
+THE USER CANNOT USE, EXPLOIT OR DISTRIBUTE THE *SOFTWARE* FOR COMMERCIAL PURPOSES  
+WITHOUT PRIOR AND EXPLICIT CONSENT OF LICENSORS. YOU MUST CONTACT INRIA FOR ANY  
+UNAUTHORIZED USE: [email protected] . ANY SUCH ACTION WILL  
+CONSTITUTE A FORGERY. THIS *SOFTWARE* IS PROVIDED "AS IS" WITHOUT ANY WARRANTIES  
+OF ANY NATURE AND ANY EXPRESS OR IMPLIED WARRANTIES, WITH REGARDS TO COMMERCIAL  
+USE, PROFESSIONNAL USE, LEGAL OR NOT, OR OTHER, OR COMMERCIALISATION OR  
+ADAPTATION. UNLESS EXPLICITLY PROVIDED BY LAW, IN NO EVENT, SHALL INRIA OR THE  
+AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR  
+CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE  
+GOODS OR SERVICES, LOSS OF USE, DATA, OR PROFITS OR BUSINESS INTERRUPTION)  
+HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT  
+LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING FROM, OUT OF OR  
+IN CONNECTION WITH THE *SOFTWARE* OR THE USE OR OTHER DEALINGS IN THE *SOFTWARE*.  
diff --git a/__pycache__/cam_utils.cpython-310.pyc b/__pycache__/cam_utils.cpython-310.pyc
diff --git a/__pycache__/gs_renderer.cpython-310.pyc b/__pycache__/gs_renderer.cpython-310.pyc
diff --git a/__pycache__/main.cpython-310.pyc b/__pycache__/main.cpython-310.pyc
diff --git a/__pycache__/mesh.cpython-310.pyc b/__pycache__/mesh.cpython-310.pyc
diff --git a/__pycache__/mesh_renderer.cpython-310.pyc b/__pycache__/mesh_renderer.cpython-310.pyc
diff --git a/cam_utils.py b/cam_utils.py
@@ -0,0 +1,146 @@
+import numpy as np
+from scipy.spatial.transform import Rotation as R
+
+import torch
+
+def dot(x, y):
+    if isinstance(x, np.ndarray):
+        return np.sum(x * y, -1, keepdims=True)
+    else:
+        return torch.sum(x * y, -1, keepdim=True)
+
+
+def length(x, eps=1e-20):
+    if isinstance(x, np.ndarray):
+        return np.sqrt(np.maximum(np.sum(x * x, axis=-1, keepdims=True), eps))
+    else:
+        return torch.sqrt(torch.clamp(dot(x, x), min=eps))
+
+
+def safe_normalize(x, eps=1e-20):
+    return x / length(x, eps)
+
+
+def look_at(campos, target, opengl=True):
+    # campos: [N, 3], camera/eye position
+    # target: [N, 3], object to look at
+    # return: [N, 3, 3], rotation matrix
+    if not opengl:
+        # camera forward aligns with -z
+        forward_vector = safe_normalize(target - campos)
+        up_vector = np.array([0, 1, 0], dtype=np.float32)
+        right_vector = safe_normalize(np.cross(forward_vector, up_vector))
+        up_vector = safe_normalize(np.cross(right_vector, forward_vector))
+    else:
+        # camera forward aligns with +z
+        forward_vector = safe_normalize(campos - target)
+        up_vector = np.array([0, 1, 0], dtype=np.float32)
+        right_vector = safe_normalize(np.cross(up_vector, forward_vector))
+        up_vector = safe_normalize(np.cross(forward_vector, right_vector))
+    R = np.stack([right_vector, up_vector, forward_vector], axis=1)
+    return R
+
+
+# elevation & azimuth to pose (cam2world) matrix
+def orbit_camera(elevation, azimuth, radius=1, is_degree=True, target=None, opengl=True):
+    # radius: scalar
+    # elevation: scalar, in (-90, 90), from +y to -y is (-90, 90)
+    # azimuth: scalar, in (-180, 180), from +z to +x is (0, 90)
+    # return: [4, 4], camera pose matrix
+    if is_degree:
+        elevation = np.deg2rad(elevation)
+        azimuth = np.deg2rad(azimuth)
+    x = radius * np.cos(elevation) * np.sin(azimuth)
+    y = - radius * np.sin(elevation)
+    z = radius * np.cos(elevation) * np.cos(azimuth)
+    if target is None:
+        target = np.zeros([3], dtype=np.float32)
+    campos = np.array([x, y, z]) + target  # [3]
+    T = np.eye(4, dtype=np.float32)
+    T[:3, :3] = look_at(campos, target, opengl)
+    T[:3, 3] = campos
+    return T
+
+
+class OrbitCamera:
+    def __init__(self, W, H, r=2, fovy=60, near=0.01, far=100):
+        self.W = W
+        self.H = H
+        self.radius = r  # camera distance from center
+        self.fovy = np.deg2rad(fovy)  # deg 2 rad
+        self.near = near
+        self.far = far
+        self.center = np.array([0, 0, 0], dtype=np.float32)  # look at this point
+        self.rot = R.from_matrix(np.eye(3))
+        self.up = np.array([0, 1, 0], dtype=np.float32)  # need to be normalized!
+
+    @property
+    def fovx(self):
+        return 2 * np.arctan(np.tan(self.fovy / 2) * self.W / self.H)
+
+    @property
+    def campos(self):
+        return self.pose[:3, 3]
+
+    # pose (c2w)
+    @property
+    def pose(self):
+        # first move camera to radius
+        res = np.eye(4, dtype=np.float32)
+        res[2, 3] = self.radius  # opengl convention...
+        # rotate
+        rot = np.eye(4, dtype=np.float32)
+        rot[:3, :3] = self.rot.as_matrix()
+        res = rot @ res
+        # translate
+        res[:3, 3] -= self.center
+        return res
+
+    # view (w2c)
+    @property
+    def view(self):
+        return np.linalg.inv(self.pose)
+
+    # projection (perspective)
+    @property
+    def perspective(self):
+        y = np.tan(self.fovy / 2)
+        aspect = self.W / self.H
+        return np.array(
+            [
+                [1 / (y * aspect), 0, 0, 0],
+                [0, -1 / y, 0, 0],
+                [
+                    0,
+                    0,
+                    -(self.far + self.near) / (self.far - self.near),
+                    -(2 * self.far * self.near) / (self.far - self.near),
+                ],
+                [0, 0, -1, 0],
+            ],
+            dtype=np.float32,
+        )
+
+    # intrinsics
+    @property
+    def intrinsics(self):
+        focal = self.H / (2 * np.tan(self.fovy / 2))
+        return np.array([focal, focal, self.W // 2, self.H // 2], dtype=np.float32)
+
+    @property
+    def mvp(self):
+        return self.perspective @ np.linalg.inv(self.pose)  # [4, 4]
+
+    def orbit(self, dx, dy):
+        # rotate along camera up/side axis!
+        side = self.rot.as_matrix()[:3, 0]
+        rotvec_x = self.up * np.radians(-0.05 * dx)
+        rotvec_y = side * np.radians(-0.05 * dy)
+        self.rot = R.from_rotvec(rotvec_x) * R.from_rotvec(rotvec_y) * self.rot
+
+    def scale(self, delta):
+        self.radius *= 1.1 ** (-delta)
+
+    def pan(self, dx, dy, dz=0):
+        # pan in camera coordinate system (careful on the sensitivity!)
+        self.center += 0.0005 * self.rot.as_matrix()[:3, :3] @ np.array([-dx, -dy, dz])
diff --git a/configs/image.yaml b/configs/image.yaml
@@ -0,0 +1,69 @@
+### Input
+# input rgba image path (default to None, can be load in GUI too)
+input: 
+# input text prompt (default to None, can be input in GUI too)
+prompt:
+# input mesh for stage 2 (auto-search from stage 1 output path if None)
+mesh:
+# estimated elevation angle for input image 
+elevation: 0
+# reference image resolution
+ref_size: 256
+# density thresh for mesh extraction
+density_thresh: 1
+
+### Output
+outdir: logs
+mesh_format: obj
+save_path: ???
+
+### Training
+# guidance loss weights (0 to disable)
+lambda_sd: 0
+lambda_zero123: 1
+# training batch size per iter
+batch_size: 1
+# training iterations for stage 1
+iters: 500
+# training iterations for stage 2
+iters_refine: 50
+# training camera radius
+radius: 2
+# training camera fovy
+fovy: 49.1 # align with zero123 rendering setting (ref: https://github.com/cvlab-columbia/zero123/blob/main/objaverse-rendering/scripts/blender_script.py#L61
+# checkpoint to load for stage 1 (should be a ply file)
+load:
+# whether allow geom training in stage 2
+train_geo: False
+# prob to invert background color during training (0 = always black, 1 = always white)
+invert_bg_prob: 0.5
+
+
+### GUI
+gui: False
+force_cuda_rast: False
+# GUI resolution
+H: 800
+W: 800
+
+### Gaussian splatting
+num_pts: 5000
+sh_degree: 0
+position_lr_init: 0.001
+position_lr_final: 0.00002
+position_lr_delay_mult: 0.02
+position_lr_max_steps: 500
+feature_lr: 0.01
+opacity_lr: 0.05
+scaling_lr: 0.005
+rotation_lr: 0.005
+percent_dense: 0.1
+density_start_iter: 100
+density_end_iter: 3000
+densification_interval: 100
+opacity_reset_interval: 700
+densify_grad_threshold: 0.5
+
+### Textured Mesh
+geom_lr: 0.0001
+texture_lr: 0.2
diff --git a/configs/text.yaml b/configs/text.yaml
@@ -0,0 +1,68 @@
+### Input
+# input rgba image path (default to None, can be load in GUI too)
+input: 
+# input text prompt (default to None, can be input in GUI too)
+prompt:
+# input mesh for stage 2 (auto-search from stage 1 output path if None)
+mesh:
+# estimated elevation angle for input image 
+elevation: 0
+# reference image resolution
+ref_size: 256
+# density thresh for mesh extraction
+density_thresh: 1
+
+### Output
+outdir: logs
+mesh_format: obj
+save_path: ???
+
+### Training
+# guidance loss weights (0 to disable)
+lambda_sd: 1
+lambda_zero123: 0
+# training batch size per iter
+batch_size: 1
+# training iterations for stage 1
+iters: 500
+# training iterations for stage 2
+iters_refine: 50
+# training camera radius
+radius: 2.5
+# training camera fovy
+fovy: 49.1
+# checkpoint to load for stage 1 (should be a ply file)
+load:
+# whether allow geom training in stage 2
+train_geo: False
+# prob to invert background color during training (0 = always black, 1 = always white)
+invert_bg_prob: 0.5
+
+### GUI
+gui: False
+force_cuda_rast: False
+# GUI resolution
+H: 800
+W: 800
+
+### Gaussian splatting
+num_pts: 1000
+sh_degree: 0
+position_lr_init: 0.001
+position_lr_final: 0.00002
+position_lr_delay_mult: 0.02
+position_lr_max_steps: 500
+feature_lr: 0.01
+opacity_lr: 0.05
+scaling_lr: 0.005
+rotation_lr: 0.005
+percent_dense: 0.1
+density_start_iter: 100
+density_end_iter: 3000
+densification_interval: 50
+opacity_reset_interval: 700
+densify_grad_threshold: 0.01
+
+### Textured Mesh
+geom_lr: 0.0001
+texture_lr: 0.2
diff --git a/data/anya_rgba.png b/data/anya_rgba.png
diff --git a/data/burger-2_rgba.png b/data/burger-2_rgba.png
diff --git a/data/catstatue_rgba.png b/data/catstatue_rgba.png
diff --git a/data/csm_luigi_rgba.png b/data/csm_luigi_rgba.png
diff --git a/data/test.png b/data/test.png
diff --git a/data/zelda_rgba.png b/data/zelda_rgba.png
diff --git a/diff-gaussian-rasterization b/diff-gaussian-rasterization