Mariia Khan⋆†, Yue Qiu⋆, Yuren Cong†, Bodo Rosenhahn†, David Suter⋆†, Jumana Abu-Khalaf⋆†
⋆† School of Science, Edith Cowan University (ECU), Australia
⋆ Artificial Intelligence Research Center (AIRC), AIST, Japan
† Institute for Information Processing, Leibniz University of Hannover (LUH), Germany
[[Paper to be published soon
]] - accepted to IROS24
Understanding of scene changes is crucial for embodied AI applications, such as visual room rearrangement, where the agent must revert changes by restoring the objects to their original locations or states. Visual changes between two scenes, pre- and post-rearrangement, encompass two tasks: scene change detection (locating changes) and image difference captioning (describing changes). While previous methods, focused on sequential 2D images, have addressed these tasks separately, it is essential to emphasize the significance of their combination. Therefore, we propose a new Scene Change Understanding (SCU) task for simultaneous change detection and description. Moreover, we go beyond change language description generation and aim to generate rearrangement instructions for the robotic agent to revert changes. To solve this task, we propose a novel method - EmbSCU, which allows to compare instance-level change object masks (for 53 frequently-seen indoor object classes) before and after changes and generate rearrangement language instructions for the agent. EmbSCU is built on our Segment Any Object Model (SAOM) - a fine-tuned version of Segment Anything Model (SAM), adapted to obtain instance-level object masks for both foreground and background objects in indoor embodied environments. EmbSCU is evaluated on our own dataset of sequential 2D image pairs before and after changes, collected from the Ai2Thor simulator. The proposed framework achieves promising results in both change detection and change description. Moreover, EmbSCU demonstrates positive generalization results on real-world scenes without using any real-life data during training.
The code for training, testing and evaluation of EmbSCU is released on 09.01.25.
The EmbSCU datasets will be released shortly.
To begin, clone this repository locally
git clone [email protected]:mariiak2021/EmbSCU.git
See here for a summary of the most important files/directories in this repository
Here's a quick summary of the most important files/directories in this repository:
environment.yml
the file with all requirements to set up conda environmentcheckpoints
the folder with different models checkpoints used for evaluation in the paperspano_code
eval_results
- The folder to keep evaluation results of any model variationresults
- The folder to keep checkpoints from model trainingMCC
- the foder with MCC-Formers training/evaluation implementationEmbSCU2
andEmbSCU
- the foders with EmbSCU training/evaluation different implementationstrain.py
- the file to train the modeleval.py
- the file to evaluate the model performance- all other files are necessary to process EmbSCU dataset
You can then install requirements by using conda, we can create a embclone
environment with our requirements by running
export MY_ENV_NAME=embclone
export CONDA_BASE="$(dirname $(dirname "${CONDA_EXE}"))"
export PIP_SRC="${CONDA_BASE}/envs/${MY_ENV_NAME}/pipsrc"
conda env create --file environment.yml --name $MY_ENV_NAME
To train the model on several GPUs run: ```bash train.py ```
To evaluate the model run:
eval.py
After you get the output captions for evaluation run:
eval_accuracy.py
To run the re-trained model in the everything mode run:
tesbatch.py