Official implementation of FLARE (CVPR 2025) - a feed-forward model for joint camera pose estimation, 3D reconstruction and novel view synthesis from sparse uncalibrated views.
- 📖 Overview
- 🛠️ TODO List
- 🌍 Installation
- 💿 Checkpoints
- 🎯 Run a Demo (Point Cloud and Camera Pose Estimation)
- 📽️ Evaluating Novel View Synthesis
- 👀 Visualization
- 📜 Citation
We present FLARE, a feed-forward model that simultaneously estimates high-quality camera poses, 3D geometry, and appearance from as few as 2-8 uncalibrated images. Our cascaded learning paradigm:
- Camera Pose Estimation: Directly regress camera poses without bundle adjustment
- Geometry Reconstruction: Decompose geometry reconstruction into two simpler sub-problems
- Appearance Modeling: Enable photorealistic novel view synthesis via 3D Gaussians
Achieves SOTA performance with inference times <0.5 seconds!
- Release point cloud and camera pose estimation code.
- Updated Gradio demo (app.py).
- Release novel view synthesis code.
- Release evaluation code.
- Release training code.
- Release data processing code.
conda create -n flare python=3.8
conda activate flare
conda install pytorch torchvision pytorch-cuda=12.1 -c pytorch -c nvidia # use the correct version of cuda for your system
pip install -r requirements.txt
conda uninstall ffmpeg
conda install -c conda-forge ffmpeg
Download the checkpoint from huggingface_geometry huggingface_nvs and place it in the checkpoints/geometry_pose.pth and checkpoints/NVS.pth directory.
sh scripts/run_pose_pointcloud.sh
torchrun --nproc_per_node=1 run_pose_pointcloud.py \
--test_dataset "1 @ CustomDataset(split='train', ROOT='Your/Data/Path', resolution=(512,384), seed=1, num_views=7, gt_num_image=0, aug_portrait_or_landscape=False, sequential_input=False)" \
--model "AsymmetricMASt3R(pos_embed='RoPE100', patch_embed_cls='ManyAR_PatchEmbed', img_size=(512, 512), head_type='catmlp+dpt', output_mode='pts3d+desc24', depth_mode=('exp', -inf, inf), conf_mode=('exp', 1, inf), enc_embed_dim=1024, enc_depth=24, enc_num_heads=16, dec_embed_dim=768, dec_depth=12, dec_num_heads=12, two_confs=True, desc_conf_mode=('exp', 0, inf))" \
--pretrained "Your/Checkpoint/Path" \
--test_criterion "MeshOutput(sam=False)" --output_dir "log/" --amp 1 --seed 1 --num_workers 0
To run the demo using ground truth camera poses: Enable the wpose=True flag in both the CustomDataset and AsymmetricMASt3R. An example script demonstrating this setup is provided in run_pose_pointcloud_wpose.sh.
sh scripts/run_pose_pointcloud_wpose.sh
sh ./visualizer/vis.sh
CUDA_VISIBLE_DEVICES=0 python visualizer/run_vis.py --result_npz data/mesh/IMG_1511.HEIC.JPG.JPG/pred.npz --results_folder data/mesh/IMG_1511.HEIC.JPG.JPG/
You can evaluate the novel view synthesis on RealEstate10K dataset using the following command:
sh scripts/run_eval_nvs.sh
Our model uses the same training and test datasets as pixelSplat. Below we quote pixelSplat's detailed instructions on getting datasets.
pixelSplat was trained using versions of the RealEstate10k and ACID datasets that were split into ~100 MB chunks for use on server cluster file systems. Small subsets of the Real Estate 10k and ACID datasets in this format can be found here. To use them, simply unzip them into a newly created
datasets
folder in the project root directory.
If you would like to convert downloaded versions of the Real Estate 10k and ACID datasets to our format, you can use the scripts here. Reach out to us (pixelSplat) if you want the full versions of our processed datasets, which are about 500 GB and 160 GB for Real Estate 10k and ACID respectively.
@misc{zhang2025flarefeedforwardgeometryappearance,
title={FLARE: Feed-forward Geometry, Appearance and Camera Estimation from Uncalibrated Sparse Views},
author={Shangzhan Zhang and Jianyuan Wang and Yinghao Xu and Nan Xue and Christian Rupprecht and Xiaowei Zhou and Yujun Shen and Gordon Wetzstein},
year={2025},
eprint={2502.12138},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2502.12138},
}
FLARE is constructed on the outstanding open-source projects. We are extremely grateful for the contributions of these projects and their communities, whose hard work has greatly propelled the development of the field and enabled our work to be realized.