GitHub - Autumn667/Osprey: The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"

Demo username & password: osprey

A part of Along the River During the Qingming Festival (清明上河图)

Spirited Away (千与千寻)

Updates 📌

[2023/12/18]🔥 We released the code, osprey-7b model and online demo for Osprey.

What is Osprey 👀

Osprey is a mask-text instruction tuning approach that extends MLLMs by incorporating pixel-wise mask regions into language instructions, enabling fine-grained visual understanding. Based on input mask region, Osprey generate the semantic descriptions including short description and detailed description.

Our Osprey can seamlessly integrate with SAM in point-prompt, box-prompt and segmentation everything modes to generate the semantics associated with specific parts or objects.

Watch Video Demo 🎥

Try Our Demo 🕹️

Online demo

Click 👇 to try our demo online.

web demo

username: osprey
password: osprey

Point
Box
Everything

Offline demo

First install Gradio-Osprey-Demo.
Install Segment Anything.

pip install git+https://github.com/facebookresearch/segment-anything.git

Download ViT-B SAM model to checkpoints.
Run app.py.

cd demo
python app.py --model checkpoint/osprey_7b

Install 🛠️

Clone this repository and navigate to Osprey folder

git clone https://github.com/CircleRadon/Osprey.git
cd Osprey

Install packages

conda create -n osprey python=3.10 -y
conda activate osprey
pip install --upgrade pip  # enable PEP 660 support
pip install -e .

Install additional packages for training cases

pip install -e ".[train]"
pip install flash-attn --no-build-isolation

Checkpoints 🤖

Convnext-large-CLIP-model🤗: model
Osprey-7b model🤗: model

Then change the "mm_vision_tower" in config.json of Osprey-7b model to the path of Convnext-large-CLIP-model.

TODO List 📝

Release the checkpoints, inference codes and demo.
Release the dataset and training scripts.
Release the evaluation code.
Release the code for data generation pipeline.

Acknowledgement 💌

LLaVA-v1.5: the codebase we built upon.
SAM: the demo uses the segmentation result from SAM as the input of Osprey.

BibTeX 🖊️

@misc{Osprey,
  title={Osprey: Pixel Understanding with Visual Instruction Tuning},
  author={Yuqian Yuan, Wentong Li, Jian Liu, Dongqi Tang, Xinjie Luo, Chi Qin, Lei Zhang and Jianke Zhu},
  year={2023},
  eprint={2312.10032},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
demo		demo
osprey		osprey
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Updates 📌

What is Osprey 👀

Watch Video Demo 🎥

Try Our Demo 🕹️

Online demo

Offline demo

Install 🛠️

Checkpoints 🤖

TODO List 📝

Acknowledgement 💌

BibTeX 🖊️

About

Releases

Packages

Languages

License

Autumn667/Osprey

Folders and files

Latest commit

History

Repository files navigation

Updates 📌

What is Osprey 👀

Watch Video Demo 🎥

Try Our Demo 🕹️

Online demo

Offline demo

Install 🛠️

Checkpoints 🤖

TODO List 📝

Acknowledgement 💌

BibTeX 🖊️

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages