-
Notifications
You must be signed in to change notification settings - Fork 70
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
code and model for the paper Neural Video Compression with Diverse Co…
…ntexts in CVPR 2023.
- Loading branch information
1 parent
ef9a2db
commit 7df42d3
Showing
37 changed files
with
4,840 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,137 @@ | ||
# Introduction | ||
|
||
Official Pytorch implementation for [Neural Video Compression with Diverse Contexts](https://arxiv.org/abs/2302.14402), in CVPR 2023. | ||
|
||
# Prerequisites | ||
* Python 3.8 and conda, get [Conda](https://www.anaconda.com/) | ||
* CUDA if want to use GPU | ||
* Environment | ||
``` | ||
conda create -n $YOUR_PY38_ENV_NAME python=3.8 | ||
conda activate $YOUR_PY38_ENV_NAME | ||
conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 -c pytorch | ||
pip install -r requirements.txt | ||
``` | ||
# Test dataset | ||
We support arbitrary original resolution. The input video resolution will be padded to 64x automatically. The reconstructed video will be cropped back to the original size. The distortion (PSNR/MS-SSIM) is calculated at original resolution. | ||
## YUV 420 content | ||
Put the*.yuv in the folder structure similar to the following structure. | ||
/media/data/HEVC_B/ | ||
- BQTerrace_1920x1080_60.yuv | ||
- BasketballDrive_1920x1080_50.yuv | ||
- ... | ||
/media/data/HEVC_D/ | ||
/media/data/HEVC_C/ | ||
... | ||
The dataset structure can be seen in dataset_config_example_yuv420.json. | ||
## RGB content | ||
Please convert YUV 420 data to RGB data using BT.709 conversion matrix. | ||
For example, one video of HEVC Class B can be prepared as: | ||
* Make the video path: | ||
``` | ||
mkdir BasketballDrive_1920x1080_50 | ||
``` | ||
* Convert YUV to PNG: | ||
We use BT.709 conversion matrix to generate png data to test RGB sequences. Please refer to ./test_data_to_png.py for more details. | ||
At last, the folder structure of dataset is like: | ||
/media/data/HEVC_B/ | ||
* BQTerrace_1920x1080_60/ | ||
- im00001.png | ||
- im00002.png | ||
- im00003.png | ||
- ... | ||
* BasketballDrive_1920x1080_50/ | ||
- im00001.png | ||
- im00002.png | ||
- im00003.png | ||
- ... | ||
* ... | ||
/media/data/HEVC_D/ | ||
/media/data/HEVC_C/ | ||
... | ||
The dataset structure can be seen in dataset_config_example_rgb.json. | ||
# Build the project | ||
Please build the C++ code if want to test with actual bitstream writing. There is minor difference about the bits for calculating the bits using entropy (the method used in the paper to report numbers) and actual bitstreaming writing. There is overhead when writing the bitstream into the file and the difference percentage depends on the bitstream size. Usually, the overhead for 1080p content is less than 0.5%. | ||
## On Windows | ||
```bash | ||
cd src | ||
mkdir build | ||
cd build | ||
conda activate $YOUR_PY38_ENV_NAME | ||
cmake ../cpp -G "Visual Studio 16 2019" -A x64 | ||
cmake --build . --config Release | ||
``` | ||
|
||
## On Linux | ||
```bash | ||
sudo apt-get install cmake g++ | ||
cd src | ||
mkdir build | ||
cd build | ||
conda activate $YOUR_PY38_ENV_NAME | ||
cmake ../cpp -DCMAKE_BUILD_TYPE=Release | ||
make -j | ||
``` | ||
|
||
# Pretrained models | ||
|
||
* Download [Our pretrained models](https://1drv.ms/u/s!AozfVVwtWWYoiWdwDhEkZMIfpon5?e=JcGri5) and put them into ./checkpoints folder. | ||
* Or run the script in ./checkpoints directly to download the model. | ||
|
||
# Test the models | ||
|
||
Example to test pretrained model with four rate points: | ||
```bash | ||
python test_video.py --i_frame_model_path ./checkpoints/cvpr2023_image_psnr.pth.tar --p_frame_model_path ./checkpoints/cvpr2023_video_psnr.pth.tar --rate_num 4 --test_config ./dataset_config_example_rgb.json --yuv420 0 --cuda 1 --worker 1 --write_stream 0 --output_path output.json --force_intra_period 32 --force_frame_num 96 | ||
``` | ||
When testing YUV 420 content, please change the model path, test configuration json file and specify ```--yuv420 1``` in the command line. | ||
|
||
It is recommended that the ```--worker``` number is equal to your GPU number. | ||
|
||
You can also specify different ```--rate_num``` values (2~64) to test finer bitrate adjustment. | ||
|
||
# Comparing with other method | ||
Bit saving over VTM-17.0 | ||
|
||
<img src="assets/bitsaving.png" width="600"> | ||
|
||
RD curve of RGB PNSR | ||
|
||
<img src="assets/rd_rgb_psnr.png" width="1000"> | ||
|
||
RD curve of YUV420 PSNR | ||
|
||
<img src="assets/rd_yuv420_psnr.png" width="750"> | ||
|
||
# Acknowledgement | ||
The implementation is based on [CompressAI](https://github.com/InterDigitalInc/CompressAI) and [PyTorchVideoCompression](https://github.com/ZhihaoHu/PyTorchVideoCompression). | ||
# Citation | ||
If you find this work useful for your research, please cite: | ||
|
||
``` | ||
@inproceedings{li2023neural, | ||
title={Neural Video Compression with Diverse Contexts}, | ||
author={Li, Jiahao and Li, Bin and Lu, Yan}, | ||
booktitle={{IEEE/CVF} Conference on Computer Vision and Pattern Recognition, | ||
{CVPR} 2023, Vancouver, Canada, June 18-22, 2023}, | ||
year={2023} | ||
} | ||
``` | ||
|
||
# Trademarks | ||
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow [Microsoft’s Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general). Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party’s policies. |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
import urllib.request | ||
|
||
|
||
def download_one(url, target): | ||
urllib.request.urlretrieve(url, target) | ||
|
||
|
||
def main(): | ||
urls = { | ||
'https://onedrive.live.com/download?cid=2866592D5C55DF8C&resid=2866592D5C55DF8C%211259&authkey=AO_gFvTcYZUFd9U': 'cvpr2023_image_psnr.pth.tar', | ||
'https://onedrive.live.com/download?cid=2866592D5C55DF8C&resid=2866592D5C55DF8C%211260&authkey=AFWlIyBB5PIudtw': 'cvpr2023_image_ssim.pth.tar', | ||
'https://onedrive.live.com/download?cid=2866592D5C55DF8C&resid=2866592D5C55DF8C%211261&authkey=AOB9I7Jv25RbyGY': 'cvpr2023_image_yuv420_psnr.pth.tar', | ||
'https://onedrive.live.com/download?cid=2866592D5C55DF8C&resid=2866592D5C55DF8C%211256&authkey=ACzRzK3XgbQxEyk': 'cvpr2023_video_psnr.pth.tar', | ||
'https://onedrive.live.com/download?cid=2866592D5C55DF8C&resid=2866592D5C55DF8C%211258&authkey=AIRQMQyZqJWG15k': 'cvpr2023_video_ssim.pth.tar', | ||
'https://onedrive.live.com/download?cid=2866592D5C55DF8C&resid=2866592D5C55DF8C%211257&authkey=AEQHk7O606IiqGA': 'cvpr2023_video_yuv420_psnr.pth.tar', | ||
} | ||
for url in urls: | ||
target = urls[url] | ||
print("downloading", target) | ||
download_one(url, target) | ||
print("downloaded", target) | ||
|
||
|
||
if __name__ == "__main__": | ||
main() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,113 @@ | ||
{ | ||
"root_path": "/media/data/", | ||
"test_classes": { | ||
"HEVC_B": { | ||
"test": 1, | ||
"base_path": "HEVC_B", | ||
"src_type": "png", | ||
"sequences": { | ||
"BQTerrace_1920x1080_60": {"width": 1920, "height": 1080, "frames": 96, "gop": 32}, | ||
"BasketballDrive_1920x1080_50": {"width": 1920, "height": 1080, "frames": 96, "gop": 32}, | ||
"Cactus_1920x1080_50": {"width": 1920, "height": 1080, "frames": 96, "gop": 32}, | ||
"Kimono1_1920x1080_24": {"width": 1920, "height": 1080, "frames": 96, "gop": 32}, | ||
"ParkScene_1920x1080_24": {"width": 1920, "height": 1080, "frames": 96, "gop": 32} | ||
} | ||
}, | ||
"HEVC_C": { | ||
"test": 1, | ||
"base_path": "HEVC_C", | ||
"src_type": "png", | ||
"sequences": { | ||
"BQMall_832x480_60": {"width": 832, "height": 480, "frames": 96, "gop": 32}, | ||
"BasketballDrill_832x480_50": {"width": 832, "height": 480, "frames": 96, "gop": 32}, | ||
"PartyScene_832x480_50": {"width": 832, "height": 480, "frames": 96, "gop": 32}, | ||
"RaceHorses_832x480_30": {"width": 832, "height": 480, "frames": 96, "gop": 32} | ||
} | ||
}, | ||
"HEVC_D": { | ||
"test": 1, | ||
"base_path": "HEVC_D", | ||
"src_type": "png", | ||
"sequences": { | ||
"BasketballPass_416x240_50": {"width": 416, "height": 240, "frames": 96, "gop": 32}, | ||
"BlowingBubbles_416x240_50": {"width": 416, "height": 240, "frames": 96, "gop": 32}, | ||
"BQSquare_416x240_60": {"width": 416, "height": 240, "frames": 96, "gop": 32}, | ||
"RaceHorses_416x240_30": {"width": 416, "height": 240, "frames": 96, "gop": 32} | ||
} | ||
}, | ||
"HEVC_E": { | ||
"test": 1, | ||
"base_path": "HEVC_E", | ||
"src_type": "png", | ||
"sequences": { | ||
"FourPeople_1280x720_60": {"width": 1280, "height": 720, "frames": 96, "gop": 32}, | ||
"Johnny_1280x720_60": {"width": 1280, "height": 720, "frames": 96, "gop": 32}, | ||
"KristenAndSara_1280x720_60": {"width": 1280, "height": 720, "frames": 96, "gop": 32} | ||
} | ||
}, | ||
"HEVC_RGB": { | ||
"test": 1, | ||
"base_path": "HEVC_RGB", | ||
"src_type": "png", | ||
"sequences": { | ||
"DucksAndLegs_1920x1080_30_RGB": {"width": 1920, "height": 1080, "frames": 96, "gop": 32}, | ||
"EBULupoCandlelight_1920x1080_50_RGB": {"width": 1920, "height": 1080, "frames": 96, "gop": 32}, | ||
"EBURainFruits_1920x1080_50_RGB": {"width": 1920, "height": 1080, "frames": 96, "gop": 32}, | ||
"Kimono1_1920x1080_24_RGB": {"width": 1920, "height": 1080, "frames": 96, "gop": 32}, | ||
"OldTownCross_1920x1080_50_RGB": {"width": 1920, "height": 1080, "frames": 96, "gop": 32}, | ||
"ParkScene_1920x1080_24_RGB": {"width": 1920, "height": 1080, "frames": 96, "gop": 32} | ||
} | ||
}, | ||
"UVG": { | ||
"test": 1, | ||
"base_path": "UVG", | ||
"src_type": "png", | ||
"sequences": { | ||
"Beauty_1920x1080_120fps_420_8bit_YUV": {"width": 1920, "height": 1080, "frames": 96, "gop": 32}, | ||
"Bosphorus_1920x1080_120fps_420_8bit_YUV": {"width": 1920, "height": 1080, "frames": 96, "gop": 32}, | ||
"HoneyBee_1920x1080_120fps_420_8bit_YUV": {"width": 1920, "height": 1080, "frames": 96, "gop": 32}, | ||
"Jockey_1920x1080_120fps_420_8bit_YUV": {"width": 1920, "height": 1080, "frames": 96, "gop": 32}, | ||
"ReadySteadyGo_1920x1080_120fps_420_8bit_YUV": {"width": 1920, "height": 1080, "frames": 96, "gop": 32}, | ||
"ShakeNDry_1920x1080_120fps_420_8bit_YUV": {"width": 1920, "height": 1080, "frames": 96, "gop": 32}, | ||
"YachtRide_1920x1080_120fps_420_8bit_YUV": {"width": 1920, "height": 1080, "frames": 96, "gop": 32} | ||
} | ||
}, | ||
"MCL-JCV": { | ||
"test": 1, | ||
"base_path": "MCL-JCV", | ||
"src_type": "png", | ||
"sequences": { | ||
"videoSRC01_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "gop": 32}, | ||
"videoSRC02_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "gop": 32}, | ||
"videoSRC03_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "gop": 32}, | ||
"videoSRC04_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "gop": 32}, | ||
"videoSRC05_1920x1080_25": {"width": 1920, "height": 1080, "frames": 96, "gop": 32}, | ||
"videoSRC06_1920x1080_25": {"width": 1920, "height": 1080, "frames": 96, "gop": 32}, | ||
"videoSRC07_1920x1080_25": {"width": 1920, "height": 1080, "frames": 96, "gop": 32}, | ||
"videoSRC08_1920x1080_25": {"width": 1920, "height": 1080, "frames": 96, "gop": 32}, | ||
"videoSRC09_1920x1080_25": {"width": 1920, "height": 1080, "frames": 96, "gop": 32}, | ||
"videoSRC10_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "gop": 32}, | ||
"videoSRC11_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "gop": 32}, | ||
"videoSRC12_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "gop": 32}, | ||
"videoSRC13_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "gop": 32}, | ||
"videoSRC14_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "gop": 32}, | ||
"videoSRC15_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "gop": 32}, | ||
"videoSRC16_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "gop": 32}, | ||
"videoSRC17_1920x1080_24": {"width": 1920, "height": 1080, "frames": 96, "gop": 32}, | ||
"videoSRC18_1920x1080_25": {"width": 1920, "height": 1080, "frames": 96, "gop": 32}, | ||
"videoSRC19_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "gop": 32}, | ||
"videoSRC20_1920x1080_25": {"width": 1920, "height": 1080, "frames": 96, "gop": 32}, | ||
"videoSRC21_1920x1080_24": {"width": 1920, "height": 1080, "frames": 96, "gop": 32}, | ||
"videoSRC22_1920x1080_24": {"width": 1920, "height": 1080, "frames": 96, "gop": 32}, | ||
"videoSRC23_1920x1080_24": {"width": 1920, "height": 1080, "frames": 96, "gop": 32}, | ||
"videoSRC24_1920x1080_24": {"width": 1920, "height": 1080, "frames": 96, "gop": 32}, | ||
"videoSRC25_1920x1080_24": {"width": 1920, "height": 1080, "frames": 96, "gop": 32}, | ||
"videoSRC26_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "gop": 32}, | ||
"videoSRC27_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "gop": 32}, | ||
"videoSRC28_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "gop": 32}, | ||
"videoSRC29_1920x1080_24": {"width": 1920, "height": 1080, "frames": 96, "gop": 32}, | ||
"videoSRC30_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "gop": 32} | ||
} | ||
} | ||
} | ||
} |
Oops, something went wrong.