Skip to content

Commit

Permalink
code and model for the paper Hybrid Spatial-Temporal Entropy Modellin…
Browse files Browse the repository at this point in the history
…g for Neural Video Compression in ACM MM 2022.
  • Loading branch information
ustclibin authored and yaohualibin committed Jul 14, 2022
1 parent 3c3e9ec commit 74f2d6f
Show file tree
Hide file tree
Showing 28 changed files with 3,200 additions and 4 deletions.
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,6 @@
*.bin
*.png
*.so
build/
build/
*.tar
*.pyd
109 changes: 108 additions & 1 deletion ACMMM2022/README.md
Original file line number Diff line number Diff line change
@@ -1 +1,108 @@
Coming soon.
# Introduction

Official Pytorch implementation for [Hybrid Spatial-Temporal Entropy Modelling for Neural Video Compression](https://arxiv.org/abs/2207.05894), in ACM MM 2022.

# Prerequisites
* Python 3.8 and conda, get [Conda](https://www.anaconda.com/)
* CUDA if want to use GPU
* Environment
```
conda create -n $YOUR_PY38_ENV_NAME python=3.8
conda activate $YOUR_PY38_ENV_NAME
conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 -c pytorch
pip install -r requirements.txt
```
# Test dataset
We support arbitrary original resolution. The input video resolution will be padded to 64x automatically. The reconstructed video will be cropped back to the original size. The distortion (PSNR/MS-SSIM) is calculated at original resolution.
The dataset format can be seen in dataset_config_example.json.
For example, one video of HEVC Class B can be prepared as:
* Make the video path:
```
mkdir BasketballDrive_1920x1080_50
```
* Convert YUV to PNG:
```
ffmpeg -pix_fmt yuv420p -s 1920x1080 -i BasketballDrive_1920x1080_50.yuv -f image2 BasketballDrive_1920x1080_50/im%05d.png
```
At last, the folder structure of dataset is like:
/media/data/HEVC_B/
* BQTerrace_1920x1080_60/
- im00001.png
- im00002.png
- im00003.png
- ...
* BasketballDrive_1920x1080_50/
- im00001.png
- im00002.png
- im00003.png
- ...
* ...
/media/data/HEVC_D
/media/data/HEVC_C/
...
# Build the project
Please build the C++ code if want to test with actual bitstream writing. There is minor difference about the bits for calculating the bits using entropy (the method used in the paper to report numbers) and actual bitstreaming writing. There is overhead when writing the bitstream into the file and the difference percentage depends on the bitstream size. Usually, the overhead for 1080p content is less than 0.5%.
## On Windows
```bash
cd src
mkdir build
cd build
conda activate $YOUR_PY38_ENV_NAME
cmake ../cpp -G "Visual Studio 16 2019" -A x64
cmake --build . --config Release
```

## On Linux
```bash
sudo apt-get install cmake g++
cd src
mkdir build
cd build
conda activate $YOUR_PY38_ENV_NAME
cmake ../cpp -DCMAKE_BUILD_TYPE=Release
make -j
```

# Pretrained models

* Download [Our pretrained models](https://1drv.ms/u/s!AozfVVwtWWYoiUAGk6xr-oELbodn?e=kry2Nk) and put them into ./checkpoints folder.
* Or run the script in ./checkpoints directly to download the model.
# Test the models

Example to test pretrained model with four rate points:
```bash
python test_video.py --i_frame_model_path ./checkpoints/acmmm2022_image_psnr.pth.tar --model_path ./checkpoints/acmmm2022_video_psnr.pth.tar --rate_num 4 --test_config ./dataset_config_example.json --cuda 1 -w 1 --write_stream 0 --output_path output.json --force_intra_period 32 --force_frame_num 96
```
It is recommended that the ```--worker``` number is equal to your GPU number.

You can also specify different q_scales values to test other bitrate points. It is suggested to change all the three q_scales together and generate the interpolated q_scales between the minimum one and maximum one.
For examples, using intra_q_scales = scale_list_to_str(interpolate_log(minimum_value, maximum_value, number_of_rate_points))
Please use --rate_num to specify the rate number and --i_frame_q_scales, --p_frame_mv_y_q_scales, --p_frame_y_q_scales to specify the q_scales.
Please note that, using q_scales out of the range [minimum_value, maximum_value] has not been tested and may generate poor encoding results.


# R-D Curves
![PSNR RD Curve](assets/rd_curve_psnr.png)

# Acknowledgement
The implementation is based on [CompressAI](https://github.com/InterDigitalInc/CompressAI) and [PyTorchVideoCompression](https://github.com/ZhihaoHu/PyTorchVideoCompression).
# Citation
If you find this work useful for your research, please cite:

```
@inproceedings{li2022hybrid,
title={Hybrid Spatial-Temporal Entropy Modelling for Neural Video Compression},
author={Li, Jiahao and Li, Bin and Lu, Yan},
booktitle={Proceedings of the 30th ACM International Conference on Multimedia},
year={2022}
}
```

# Trademarks
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow [Microsoft’s Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general). Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party’s policies.
Binary file added ACMMM2022/assets/rd_curve_psnr.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
23 changes: 23 additions & 0 deletions ACMMM2022/checkpoints/download.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
import urllib.request


def download_one(url, target):
urllib.request.urlretrieve(url, target)


def main():
urls = {
'https://onedrive.live.com/download?cid=2866592D5C55DF8C&resid=2866592D5C55DF8C%211220&authkey=AMRg1W3PVt_F3yc': 'acmmm2022_image_psnr.pth.tar',
'https://onedrive.live.com/download?cid=2866592D5C55DF8C&resid=2866592D5C55DF8C%211219&authkey=ACJnPOPf1ntw_w0': 'acmmm2022_image_ssim.pth.tar',
'https://onedrive.live.com/download?cid=2866592D5C55DF8C&resid=2866592D5C55DF8C%211217&authkey=AKpdgXQtvs-OxRs': 'acmmm2022_video_psnr.pth.tar',
'https://onedrive.live.com/download?cid=2866592D5C55DF8C&resid=2866592D5C55DF8C%211218&authkey=ANxapLv3PcCJ4Vw': 'acmmm2022_video_ssim.pth.tar',
}
for url in urls:
target = urls[url]
print("downloading", target)
download_one(url, target)
print("downloaded", target)


if __name__ == "__main__":
main()
113 changes: 113 additions & 0 deletions ACMMM2022/dataset_config_example.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
{
"root_path": "/media/data/",
"test_classes": {
"HEVC_B": {
"test": 1,
"base_path": "HEVC_B",
"src_type": "png",
"sequences": {
"BQTerrace_1920x1080_60": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"BasketballDrive_1920x1080_50": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"Cactus_1920x1080_50": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"Kimono1_1920x1080_24": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"ParkScene_1920x1080_24": {"width": 1920, "height": 1080, "frames": 96, "gop": 32}
}
},
"HEVC_C": {
"test": 1,
"base_path": "HEVC_C",
"src_type": "png",
"sequences": {
"BQMall_832x480_60": {"width": 832, "height": 480, "frames": 96, "gop": 32},
"BasketballDrill_832x480_50": {"width": 832, "height": 480, "frames": 96, "gop": 32},
"PartyScene_832x480_50": {"width": 832, "height": 480, "frames": 96, "gop": 32},
"RaceHorses_832x480_30": {"width": 832, "height": 480, "frames": 96, "gop": 32}
}
},
"HEVC_D": {
"test": 1,
"base_path": "HEVC_D",
"src_type": "png",
"sequences": {
"BasketballPass_416x240_50": {"width": 416, "height": 240, "frames": 96, "gop": 32},
"BlowingBubbles_416x240_50": {"width": 416, "height": 240, "frames": 96, "gop": 32},
"BQSquare_416x240_60": {"width": 416, "height": 240, "frames": 96, "gop": 32},
"RaceHorses_416x240_30": {"width": 416, "height": 240, "frames": 96, "gop": 32}
}
},
"HEVC_E": {
"test": 1,
"base_path": "HEVC_E",
"src_type": "png",
"sequences": {
"FourPeople_1280x720_60": {"width": 1280, "height": 720, "frames": 96, "gop": 32},
"Johnny_1280x720_60": {"width": 1280, "height": 720, "frames": 96, "gop": 32},
"KristenAndSara_1280x720_60": {"width": 1280, "height": 720, "frames": 96, "gop": 32}
}
},
"HEVC_RGB": {
"test": 1,
"base_path": "HEVC_RGB",
"src_type": "png",
"sequences": {
"DucksAndLegs_1920x1080_30_RGB": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"EBULupoCandlelight_1920x1080_50_RGB": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"EBURainFruits_1920x1080_50_RGB": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"Kimono1_1920x1080_24_RGB": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"OldTownCross_1920x1080_50_RGB": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"ParkScene_1920x1080_24_RGB": {"width": 1920, "height": 1080, "frames": 96, "gop": 32}
}
},
"UVG": {
"test": 1,
"base_path": "UVG",
"src_type": "png",
"sequences": {
"Beauty_1920x1080_120fps_420_8bit_YUV": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"Bosphorus_1920x1080_120fps_420_8bit_YUV": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"HoneyBee_1920x1080_120fps_420_8bit_YUV": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"Jockey_1920x1080_120fps_420_8bit_YUV": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"ReadySteadyGo_1920x1080_120fps_420_8bit_YUV": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"ShakeNDry_1920x1080_120fps_420_8bit_YUV": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"YachtRide_1920x1080_120fps_420_8bit_YUV": {"width": 1920, "height": 1080, "frames": 96, "gop": 32}
}
},
"MCL-JCV": {
"test": 1,
"base_path": "MCL-JCV",
"src_type": "png",
"sequences": {
"videoSRC01_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"videoSRC02_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"videoSRC03_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"videoSRC04_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"videoSRC05_1920x1080_25": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"videoSRC06_1920x1080_25": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"videoSRC07_1920x1080_25": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"videoSRC08_1920x1080_25": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"videoSRC09_1920x1080_25": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"videoSRC10_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"videoSRC11_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"videoSRC12_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"videoSRC13_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"videoSRC14_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"videoSRC15_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"videoSRC16_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"videoSRC17_1920x1080_24": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"videoSRC18_1920x1080_25": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"videoSRC19_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"videoSRC20_1920x1080_25": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"videoSRC21_1920x1080_24": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"videoSRC22_1920x1080_24": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"videoSRC23_1920x1080_24": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"videoSRC24_1920x1080_24": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"videoSRC25_1920x1080_24": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"videoSRC26_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"videoSRC27_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"videoSRC28_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"videoSRC29_1920x1080_24": {"width": 1920, "height": 1080, "frames": 96, "gop": 32},
"videoSRC30_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "gop": 32}
}
}
}
}
9 changes: 9 additions & 0 deletions ACMMM2022/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
numpy>=1.20.0
scipy
matplotlib==3.3.4
torch>=1.10.0
pytorch-msssim==0.2.0
tensorboard
tqdm
bd-metric
ptflops
5 changes: 5 additions & 0 deletions ACMMM2022/src/cpp/3rdparty/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.

add_subdirectory(pybind11)
add_subdirectory(ryg_rans)
24 changes: 24 additions & 0 deletions ACMMM2022/src/cpp/3rdparty/pybind11/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.

configure_file(CMakeLists.txt.in pybind11-download/CMakeLists.txt)
execute_process(COMMAND ${CMAKE_COMMAND} -G "${CMAKE_GENERATOR}" .
RESULT_VARIABLE result
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}/pybind11-download )
if(result)
message(FATAL_ERROR "CMake step for pybind11 failed: ${result}")
endif()
execute_process(COMMAND ${CMAKE_COMMAND} --build .
RESULT_VARIABLE result
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}/pybind11-download )
if(result)
message(FATAL_ERROR "Build step for pybind11 failed: ${result}")
endif()

add_subdirectory(${CMAKE_CURRENT_BINARY_DIR}/pybind11-src/
${CMAKE_CURRENT_BINARY_DIR}/pybind11-build/
EXCLUDE_FROM_ALL)

set(PYBIND11_INCLUDE
${CMAKE_CURRENT_BINARY_DIR}/pybind11-src/include/
CACHE INTERNAL "")
33 changes: 33 additions & 0 deletions ACMMM2022/src/cpp/3rdparty/pybind11/CMakeLists.txt.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
cmake_minimum_required(VERSION 3.6.3)

project(pybind11-download NONE)

include(ExternalProject)
if(IS_DIRECTORY "${PROJECT_BINARY_DIR}/3rdparty/pybind11/pybind11-src/include")
ExternalProject_Add(pybind11
GIT_REPOSITORY https://github.com/pybind/pybind11.git
GIT_TAG v2.6.1
GIT_SHALLOW 1
SOURCE_DIR "${CMAKE_CURRENT_BINARY_DIR}/pybind11-src"
BINARY_DIR "${CMAKE_CURRENT_BINARY_DIR}/pybind11-build"
DOWNLOAD_COMMAND ""
UPDATE_COMMAND ""
CONFIGURE_COMMAND ""
BUILD_COMMAND ""
INSTALL_COMMAND ""
TEST_COMMAND ""
)
else()
ExternalProject_Add(pybind11
GIT_REPOSITORY https://github.com/pybind/pybind11.git
GIT_TAG v2.6.1
GIT_SHALLOW 1
SOURCE_DIR "${CMAKE_CURRENT_BINARY_DIR}/pybind11-src"
BINARY_DIR "${CMAKE_CURRENT_BINARY_DIR}/pybind11-build"
UPDATE_COMMAND ""
CONFIGURE_COMMAND ""
BUILD_COMMAND ""
INSTALL_COMMAND ""
TEST_COMMAND ""
)
endif()
24 changes: 24 additions & 0 deletions ACMMM2022/src/cpp/3rdparty/ryg_rans/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.

configure_file(CMakeLists.txt.in ryg_rans-download/CMakeLists.txt)
execute_process(COMMAND ${CMAKE_COMMAND} -G "${CMAKE_GENERATOR}" .
RESULT_VARIABLE result
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}/ryg_rans-download )
if(result)
message(FATAL_ERROR "CMake step for ryg_rans failed: ${result}")
endif()
execute_process(COMMAND ${CMAKE_COMMAND} --build .
RESULT_VARIABLE result
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}/ryg_rans-download )
if(result)
message(FATAL_ERROR "Build step for ryg_rans failed: ${result}")
endif()

# add_subdirectory(${CMAKE_CURRENT_BINARY_DIR}/ryg_rans-src/
# ${CMAKE_CURRENT_BINARY_DIR}/ryg_rans-build
# EXCLUDE_FROM_ALL)

set(RYG_RANS_INCLUDE
${CMAKE_CURRENT_BINARY_DIR}/ryg_rans-src/
CACHE INTERNAL "")
33 changes: 33 additions & 0 deletions ACMMM2022/src/cpp/3rdparty/ryg_rans/CMakeLists.txt.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
cmake_minimum_required(VERSION 3.6.3)

project(ryg_rans-download NONE)

include(ExternalProject)
if(EXISTS "${PROJECT_BINARY_DIR}/3rdparty/ryg_rans/ryg_rans-src/rans64.h")
ExternalProject_Add(ryg_rans
GIT_REPOSITORY https://github.com/rygorous/ryg_rans.git
GIT_TAG c9d162d996fd600315af9ae8eb89d832576cb32d
GIT_SHALLOW 1
SOURCE_DIR "${CMAKE_CURRENT_BINARY_DIR}/ryg_rans-src"
BINARY_DIR "${CMAKE_CURRENT_BINARY_DIR}/ryg_rans-build"
DOWNLOAD_COMMAND ""
UPDATE_COMMAND ""
CONFIGURE_COMMAND ""
BUILD_COMMAND ""
INSTALL_COMMAND ""
TEST_COMMAND ""
)
else()
ExternalProject_Add(ryg_rans
GIT_REPOSITORY https://github.com/rygorous/ryg_rans.git
GIT_TAG c9d162d996fd600315af9ae8eb89d832576cb32d
GIT_SHALLOW 1
SOURCE_DIR "${CMAKE_CURRENT_BINARY_DIR}/ryg_rans-src"
BINARY_DIR "${CMAKE_CURRENT_BINARY_DIR}/ryg_rans-build"
UPDATE_COMMAND ""
CONFIGURE_COMMAND ""
BUILD_COMMAND ""
INSTALL_COMMAND ""
TEST_COMMAND ""
)
endif()
Loading

0 comments on commit 74f2d6f

Please sign in to comment.