FLARE: Feed-forward Geometry, Appearance and Camera Estimation from Uncalibrated Sparse Views

Official implementation of FLARE (CVPR 2025) - a feed-forward model for joint camera pose estimation, 3D reconstruction and novel view synthesis from sparse uncalibrated views.

📖 Overview
🛠️ TODO List
🌍 Installation
💿 Checkpoints
🎯 Run a Demo (Point Cloud and Camera Pose Estimation)
📽️ Evaluating Novel View Synthesis
👀 Visualization
📜 Citation

📖 Overview

We present FLARE, a feed-forward model that simultaneously estimates high-quality camera poses, 3D geometry, and appearance from as few as 2-8 uncalibrated images. Our cascaded learning paradigm:

Camera Pose Estimation: Directly regress camera poses without bundle adjustment
Geometry Reconstruction: Decompose geometry reconstruction into two simpler sub-problems
Appearance Modeling: Enable photorealistic novel view synthesis via 3D Gaussians

Achieves SOTA performance with inference times <0.5 seconds!

🛠️ TODO List

Release point cloud and camera pose estimation code.
Updated Gradio demo (app.py).
Release novel view synthesis code.
Release evaluation code.
Release training code.
Release data processing code.

🌍 Installation

conda create -n flare python=3.8
conda activate flare 
conda install pytorch torchvision pytorch-cuda=12.1 -c pytorch -c nvidia  # use the correct version of cuda for your system
pip install -r requirements.txt
conda uninstall ffmpeg  
conda install -c conda-forge ffmpeg

💿 Checkpoints

Download the checkpoint from huggingface_geometry huggingface_nvs and place it in the checkpoints/geometry_pose.pth and checkpoints/NVS.pth directory.

🎯 Run a Demo (Point Cloud and Camera Pose Estimation)

sh scripts/run_pose_pointcloud.sh

torchrun --nproc_per_node=1 run_pose_pointcloud.py \
    --test_dataset "1 @ CustomDataset(split='train', ROOT='Your/Data/Path', resolution=(512,384), seed=1, num_views=7, gt_num_image=0, aug_portrait_or_landscape=False, sequential_input=False)" \
    --model "AsymmetricMASt3R(pos_embed='RoPE100', patch_embed_cls='ManyAR_PatchEmbed', img_size=(512, 512), head_type='catmlp+dpt', output_mode='pts3d+desc24', depth_mode=('exp', -inf, inf), conf_mode=('exp', 1, inf), enc_embed_dim=1024, enc_depth=24, enc_num_heads=16, dec_embed_dim=768, dec_depth=12, dec_num_heads=12, two_confs=True, desc_conf_mode=('exp', 0, inf))" \
    --pretrained "Your/Checkpoint/Path" \
    --test_criterion "MeshOutput(sam=False)" --output_dir "log/" --amp 1 --seed 1 --num_workers 0

To run the demo using ground truth camera poses: Enable the wpose=True flag in both the CustomDataset and AsymmetricMASt3R. An example script demonstrating this setup is provided in run_pose_pointcloud_wpose.sh.

sh scripts/run_pose_pointcloud_wpose.sh

👀 Visualization

sh ./visualizer/vis.sh

CUDA_VISIBLE_DEVICES=0 python visualizer/run_vis.py --result_npz data/mesh/IMG_1511.HEIC.JPG.JPG/pred.npz --results_folder data/mesh/IMG_1511.HEIC.JPG.JPG/

📽️ Evaluating Novel View Synthesis

You can evaluate the novel view synthesis on RealEstate10K dataset using the following command:

sh scripts/run_eval_nvs.sh

RealEstate10K Dataset

Our model uses the same training and test datasets as pixelSplat. Below we quote pixelSplat's detailed instructions on getting datasets.

pixelSplat was trained using versions of the RealEstate10k and ACID datasets that were split into ~100 MB chunks for use on server cluster file systems. Small subsets of the Real Estate 10k and ACID datasets in this format can be found here. To use them, simply unzip them into a newly created datasets folder in the project root directory.

If you would like to convert downloaded versions of the Real Estate 10k and ACID datasets to our format, you can use the scripts here. Reach out to us (pixelSplat) if you want the full versions of our processed datasets, which are about 500 GB and 160 GB for Real Estate 10k and ACID respectively.

📜 Citation

@misc{zhang2025flarefeedforwardgeometryappearance,
      title={FLARE: Feed-forward Geometry, Appearance and Camera Estimation from Uncalibrated Sparse Views}, 
      author={Shangzhan Zhang and Jianyuan Wang and Yinghao Xu and Nan Xue and Christian Rupprecht and Xiaowei Zhou and Yujun Shen and Gordon Wetzstein},
      year={2025},
      eprint={2502.12138},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2502.12138}, 
}

🙏 Acknowledgement

FLARE is constructed on the outstanding open-source projects. We are extremely grateful for the contributions of these projects and their communities, whose hard work has greatly propelled the development of the field and enabled our work to be realized.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
assets		assets
dust3r		dust3r
mast3r		mast3r
scripts		scripts
visualizer		visualizer
LEGAL.md		LEGAL.md
LICENSE.txt		LICENSE.txt
README.md		README.md
app.py		app.py
eval_nvs.py		eval_nvs.py
requirements.txt		requirements.txt
run_pose_pointcloud.py		run_pose_pointcloud.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FLARE: Feed-forward Geometry, Appearance and Camera Estimation from Uncalibrated Sparse Views

📖 Overview

🛠️ TODO List

🌍 Installation

💿 Checkpoints

🎯 Run a Demo (Point Cloud and Camera Pose Estimation)

👀 Visualization

📽️ Evaluating Novel View Synthesis

RealEstate10K Dataset

📜 Citation

🙏 Acknowledgement

About

Releases

Packages

Languages

License

ant-research/FLARE

Folders and files

Latest commit

History

Repository files navigation

FLARE: Feed-forward Geometry, Appearance and Camera Estimation from Uncalibrated Sparse Views

📖 Overview

🛠️ TODO List

🌍 Installation

💿 Checkpoints

🎯 Run a Demo (Point Cloud and Camera Pose Estimation)

👀 Visualization

📽️ Evaluating Novel View Synthesis

RealEstate10K Dataset

📜 Citation

🙏 Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages