Skip to content
forked from ybgdgh/L3MVN

Leveraging Large Language Models for Visual Target Navigation

Notifications You must be signed in to change notification settings

edwardjjj/L3MVN

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

L3MVN: Leveraging Large Language Models for Visual Target Navigation

This work is based on our paper. We proposed a new framework to explore and search for the target in unknown environment based on Large Language Model. Our work is based on SemExp and llm_scene_understanding, implemented in PyTorch.

Author: Bangguo Yu, Hamidreza Kasaei and Ming Cao

Affiliation: University of Groningen

Frontier Semantic Exploration Framework

Visual target navigation in unknown environments is a crucial problem in robotics. Despite extensive investigation of classical and learning-based approaches in the past, robots lack common-sense knowledge about household objects and layouts. Prior state-of-the-art approaches to this task rely on learning the priors during the training and typically require significant expensive resources and time for learning. To address this, we propose a new framework for visual target navigation that leverages Large Language Models (LLM) to impart common sense for object searching. Specifically, we introduce two paradigms: (i) zero-shot and (ii) feed-forward approaches that use language to find the relevant frontier from the semantic map as a long-term goal and explore the environment efficiently. Our analysis demonstrates the notable zero-shot generalization and transfer capabilities from the use of language. Experiments on Gibson and Habitat-Matterport 3D (HM3D) demonstrate that the proposed framework significantly outperforms existing map-based methods in terms of success rate and generalization. Ablation analysis also indicates that the common-sense knowledge from the language model leads to more efficient semantic exploration. Finally, we provide a real robot experiment to verify the applicability of our framework in real-world scenarios. The supplementary video and code can be accessed via the following link: https://sites.google.com/view/l3mvn.

image-20200706200822807

Docker

  1. Clone repository
git clone https://github.com/edwardjjj/l3mvn.git l3mvn
  1. Install utility function to download hm3d dataset
conda create -n l3mvn
conda install habitat-sim-challenge-2022 headless -c conda-forge -c aihabitat -n l3mvn
conda activate l3mvn
python -m habitat_sim.utils.datasets_download --username <api-token-id> --password <api-token-secret> --uids hm3d --data-path l3mvn/data
ln -s -f ../versioned_data/hm3d-1.0/hm3d l3mvn/data/scene_datasets/hm3d
  1. Download segmentation model from here. Put the downloaded file in l3mvn/data.

  2. Download test set from here. Unzip and rename the folder to objectgoal_hm3d and place it in l3mvn/data

  3. Build Docker image

docker build -t l3mvn:1.0 .
  1. Run the image
docker run --gpus all -v .:/home/app -it l3mvn:1.0

inside the container run the following to test the feed-forward method

cd l3mvn
. activate habitat
python main_llm_vis.py --split val --eval 1 --auto_gpu_config 0 \
-n 8 --num_eval_episodes 250 --load pretrained_models/llm_model.pt \
--use_gtsem 0 --num_local_steps 10

run the following to test the zero-shot method

cd l3mvn
. activate habitat
python main_llm_zeroshot.py --split val --eval 1 --auto_gpu_config 0 \
-n 5 --num_eval_episodes 400 --num_processes_on_first_gpu 5 \
--use_gtsem 0 --num_local_steps 10 --exp_name exp_llm_hm3d_zero \

Installation

The code has been tested only with Python 3.7 on Ubuntu 20.04.

  1. Installing Dependencies
git clone https://github.com/facebookresearch/habitat-sim.git
cd habitat-sim; git checkout tags/challenge-2022; 
pip install -r requirements.txt; 
python setup.py install --headless
python setup.py install # (for Mac OS)
  • Installing habitat-lab:
git clone https://github.com/facebookresearch/habitat-lab.git
cd habitat-lab; git checkout tags/challenge-2022; 
pip install -e .
  • Install pytorch according to your system configuration. The code is tested on pytorch v1.7.0 and cudatoolkit v11.4. If you are using conda:
conda install pytorch==1.7.0 torchvision==0.8.1 cudatoolkit=11.4 #(Linux with GPU)
conda install pytorch==1.7.0 torchvision==0.8.1 -c pytorch #(Mac OS)
  • Install detectron2 according to your system configuration.
  1. Download HM3D datasets:

Habitat Matterport

Download HM3D dataset using download utility and instructions:

python -m habitat_sim.utils.datasets_download --username <api-token-id> --password <api-token-secret> --uids hm3d_minival
  1. Download additional datasets

Download the segmentation model in RedNet/model path.

Setup

Clone the repository and install other requirements:

git clone https://github.com/ybgdgh/L3MVN
cd L3MVN/
pip install -r requirements.txt

Setting up datasets

The code requires the datasets in a data folder in the following format (same as habitat-lab):

L3MVN/
  data/
    scene_datasets/
    matterport_category_mappings.tsv
    object_norm_inv_perplexity.npy
    versioned_data
    objectgoal_hm3d/
        train/
        val/
        val_mini/

For evaluation:

For evaluating the pre-trained model:

python main_llm_vis.py --split val --eval 1 --auto_gpu_config 0 \
-n 1 --num_eval_episodes 2000 --load pretrained_models/llm_model.pt \
--use_gtsem 0 --num_local_steps 10

Demo Video

video

About

Leveraging Large Language Models for Visual Target Navigation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 98.8%
  • Other 1.2%