Skip to content

Commit

Permalink
code and model for the paper Neural Video Compression with Diverse Co…
Browse files Browse the repository at this point in the history
…ntexts in CVPR 2023.
  • Loading branch information
ustclibin authored and yaohualibin committed Mar 1, 2023
1 parent ef9a2db commit 7df42d3
Show file tree
Hide file tree
Showing 37 changed files with 4,840 additions and 0 deletions.
137 changes: 137 additions & 0 deletions CVPR2023/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
# Introduction

Official Pytorch implementation for [Neural Video Compression with Diverse Contexts](https://arxiv.org/abs/2302.14402), in CVPR 2023.

# Prerequisites
* Python 3.8 and conda, get [Conda](https://www.anaconda.com/)
* CUDA if want to use GPU
* Environment
```
conda create -n $YOUR_PY38_ENV_NAME python=3.8
conda activate $YOUR_PY38_ENV_NAME
conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 -c pytorch
pip install -r requirements.txt
```
# Test dataset
We support arbitrary original resolution. The input video resolution will be padded to 64x automatically. The reconstructed video will be cropped back to the original size. The distortion (PSNR/MS-SSIM) is calculated at original resolution.
## YUV 420 content
Put the*.yuv in the folder structure similar to the following structure.
/media/data/HEVC_B/
- BQTerrace_1920x1080_60.yuv
- BasketballDrive_1920x1080_50.yuv
- ...
/media/data/HEVC_D/
/media/data/HEVC_C/
...
The dataset structure can be seen in dataset_config_example_yuv420.json.
## RGB content
Please convert YUV 420 data to RGB data using BT.709 conversion matrix.
For example, one video of HEVC Class B can be prepared as:
* Make the video path:
```
mkdir BasketballDrive_1920x1080_50
```
* Convert YUV to PNG:
We use BT.709 conversion matrix to generate png data to test RGB sequences. Please refer to ./test_data_to_png.py for more details.
At last, the folder structure of dataset is like:
/media/data/HEVC_B/
* BQTerrace_1920x1080_60/
- im00001.png
- im00002.png
- im00003.png
- ...
* BasketballDrive_1920x1080_50/
- im00001.png
- im00002.png
- im00003.png
- ...
* ...
/media/data/HEVC_D/
/media/data/HEVC_C/
...
The dataset structure can be seen in dataset_config_example_rgb.json.
# Build the project
Please build the C++ code if want to test with actual bitstream writing. There is minor difference about the bits for calculating the bits using entropy (the method used in the paper to report numbers) and actual bitstreaming writing. There is overhead when writing the bitstream into the file and the difference percentage depends on the bitstream size. Usually, the overhead for 1080p content is less than 0.5%.
## On Windows
```bash
cd src
mkdir build
cd build
conda activate $YOUR_PY38_ENV_NAME
cmake ../cpp -G "Visual Studio 16 2019" -A x64
cmake --build . --config Release
```

## On Linux
```bash
sudo apt-get install cmake g++
cd src
mkdir build
cd build
conda activate $YOUR_PY38_ENV_NAME
cmake ../cpp -DCMAKE_BUILD_TYPE=Release
make -j
```

# Pretrained models

* Download [Our pretrained models](https://1drv.ms/u/s!AozfVVwtWWYoiWdwDhEkZMIfpon5?e=JcGri5) and put them into ./checkpoints folder.
* Or run the script in ./checkpoints directly to download the model.

# Test the models

Example to test pretrained model with four rate points:
```bash
python test_video.py --i_frame_model_path ./checkpoints/cvpr2023_image_psnr.pth.tar --p_frame_model_path ./checkpoints/cvpr2023_video_psnr.pth.tar --rate_num 4 --test_config ./dataset_config_example_rgb.json --yuv420 0 --cuda 1 --worker 1 --write_stream 0 --output_path output.json --force_intra_period 32 --force_frame_num 96
```
When testing YUV 420 content, please change the model path, test configuration json file and specify ```--yuv420 1``` in the command line.

It is recommended that the ```--worker``` number is equal to your GPU number.

You can also specify different ```--rate_num``` values (2~64) to test finer bitrate adjustment.

# Comparing with other method
Bit saving over VTM-17.0

<img src="assets/bitsaving.png" width="600">

RD curve of RGB PNSR

<img src="assets/rd_rgb_psnr.png" width="1000">

RD curve of YUV420 PSNR

<img src="assets/rd_yuv420_psnr.png" width="750">

# Acknowledgement
The implementation is based on [CompressAI](https://github.com/InterDigitalInc/CompressAI) and [PyTorchVideoCompression](https://github.com/ZhihaoHu/PyTorchVideoCompression).
# Citation
If you find this work useful for your research, please cite:

```
@inproceedings{li2023neural,
title={Neural Video Compression with Diverse Contexts},
author={Li, Jiahao and Li, Bin and Lu, Yan},
booktitle={{IEEE/CVF} Conference on Computer Vision and Pattern Recognition,
{CVPR} 2023, Vancouver, Canada, June 18-22, 2023},
year={2023}
}
```

# Trademarks
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow [Microsoft’s Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general). Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party’s policies.
Binary file added CVPR2023/assets/bitsaving.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added CVPR2023/assets/rd_rgb_psnr.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added CVPR2023/assets/rd_yuv420_psnr.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
25 changes: 25 additions & 0 deletions CVPR2023/checkpoints/download.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
import urllib.request


def download_one(url, target):
urllib.request.urlretrieve(url, target)


def main():
urls = {
'https://onedrive.live.com/download?cid=2866592D5C55DF8C&resid=2866592D5C55DF8C%211259&authkey=AO_gFvTcYZUFd9U': 'cvpr2023_image_psnr.pth.tar',
'https://onedrive.live.com/download?cid=2866592D5C55DF8C&resid=2866592D5C55DF8C%211260&authkey=AFWlIyBB5PIudtw': 'cvpr2023_image_ssim.pth.tar',
'https://onedrive.live.com/download?cid=2866592D5C55DF8C&resid=2866592D5C55DF8C%211261&authkey=AOB9I7Jv25RbyGY': 'cvpr2023_image_yuv420_psnr.pth.tar',
'https://onedrive.live.com/download?cid=2866592D5C55DF8C&resid=2866592D5C55DF8C%211256&authkey=ACzRzK3XgbQxEyk': 'cvpr2023_video_psnr.pth.tar',
'https://onedrive.live.com/download?cid=2866592D5C55DF8C&resid=2866592D5C55DF8C%211258&authkey=AIRQMQyZqJWG15k': 'cvpr2023_video_ssim.pth.tar',
'https://onedrive.live.com/download?cid=2866592D5C55DF8C&resid=2866592D5C55DF8C%211257&authkey=AEQHk7O606IiqGA': 'cvpr2023_video_yuv420_psnr.pth.tar',
}
for url in urls:
target = urls[url]
print("downloading", target)
download_one(url, target)
print("downloaded", target)


if __name__ == "__main__":
main()
113 changes: 113 additions & 0 deletions CVPR2023/dataset_config_example_rgb.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
{
"root_path": "/media/data/",
"test_classes": {
"HEVC_B": {
"test": 1,
"base_path": "HEVC_B",
"src_type": "png",
"sequences": {
"BQTerrace_1920x1080_60": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"BasketballDrive_1920x1080_50": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"Cactus_1920x1080_50": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"Kimono1_1920x1080_24": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"ParkScene_1920x1080_24": {"width": 1920, "height": 1080, "frames": 96, "gop": 32}
}
},
"HEVC_C": {
"test": 1,
"base_path": "HEVC_C",
"src_type": "png",
"sequences": {
"BQMall_832x480_60": {"width": 832, "height": 480, "frames": 96, "gop": 32},
"BasketballDrill_832x480_50": {"width": 832, "height": 480, "frames": 96, "gop": 32},
"PartyScene_832x480_50": {"width": 832, "height": 480, "frames": 96, "gop": 32},
"RaceHorses_832x480_30": {"width": 832, "height": 480, "frames": 96, "gop": 32}
}
},
"HEVC_D": {
"test": 1,
"base_path": "HEVC_D",
"src_type": "png",
"sequences": {
"BasketballPass_416x240_50": {"width": 416, "height": 240, "frames": 96, "gop": 32},
"BlowingBubbles_416x240_50": {"width": 416, "height": 240, "frames": 96, "gop": 32},
"BQSquare_416x240_60": {"width": 416, "height": 240, "frames": 96, "gop": 32},
"RaceHorses_416x240_30": {"width": 416, "height": 240, "frames": 96, "gop": 32}
}
},
"HEVC_E": {
"test": 1,
"base_path": "HEVC_E",
"src_type": "png",
"sequences": {
"FourPeople_1280x720_60": {"width": 1280, "height": 720, "frames": 96, "gop": 32},
"Johnny_1280x720_60": {"width": 1280, "height": 720, "frames": 96, "gop": 32},
"KristenAndSara_1280x720_60": {"width": 1280, "height": 720, "frames": 96, "gop": 32}
}
},
"HEVC_RGB": {
"test": 1,
"base_path": "HEVC_RGB",
"src_type": "png",
"sequences": {
"DucksAndLegs_1920x1080_30_RGB": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"EBULupoCandlelight_1920x1080_50_RGB": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"EBURainFruits_1920x1080_50_RGB": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"Kimono1_1920x1080_24_RGB": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"OldTownCross_1920x1080_50_RGB": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"ParkScene_1920x1080_24_RGB": {"width": 1920, "height": 1080, "frames": 96, "gop": 32}
}
},
"UVG": {
"test": 1,
"base_path": "UVG",
"src_type": "png",
"sequences": {
"Beauty_1920x1080_120fps_420_8bit_YUV": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"Bosphorus_1920x1080_120fps_420_8bit_YUV": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"HoneyBee_1920x1080_120fps_420_8bit_YUV": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"Jockey_1920x1080_120fps_420_8bit_YUV": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"ReadySteadyGo_1920x1080_120fps_420_8bit_YUV": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"ShakeNDry_1920x1080_120fps_420_8bit_YUV": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"YachtRide_1920x1080_120fps_420_8bit_YUV": {"width": 1920, "height": 1080, "frames": 96, "gop": 32}
}
},
"MCL-JCV": {
"test": 1,
"base_path": "MCL-JCV",
"src_type": "png",
"sequences": {
"videoSRC01_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"videoSRC02_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"videoSRC03_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"videoSRC04_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"videoSRC05_1920x1080_25": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"videoSRC06_1920x1080_25": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"videoSRC07_1920x1080_25": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"videoSRC08_1920x1080_25": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"videoSRC09_1920x1080_25": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"videoSRC10_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"videoSRC11_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"videoSRC12_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"videoSRC13_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"videoSRC14_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"videoSRC15_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"videoSRC16_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"videoSRC17_1920x1080_24": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"videoSRC18_1920x1080_25": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"videoSRC19_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"videoSRC20_1920x1080_25": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"videoSRC21_1920x1080_24": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"videoSRC22_1920x1080_24": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"videoSRC23_1920x1080_24": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"videoSRC24_1920x1080_24": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"videoSRC25_1920x1080_24": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"videoSRC26_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"videoSRC27_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"videoSRC28_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"videoSRC29_1920x1080_24": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"videoSRC30_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "gop": 32}
}
}
}
}
Loading

0 comments on commit 7df42d3

Please sign in to comment.