Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MONAI's CRF takes too long on CPU #2250

Closed
masadcv opened this issue May 26, 2021 · 5 comments
Closed

MONAI's CRF takes too long on CPU #2250

masadcv opened this issue May 26, 2021 · 5 comments
Labels
question Further information is requested

Comments

@masadcv
Copy link
Contributor

masadcv commented May 26, 2021

Describe the bug
I am running MONAI's CRF implementation on CPU on a 3D volume of size (120, 150, 100).
It takes 177.5296 sec to run on CPU using MONAI's implementation
The same can be achieved in 5.8184 sec using SimpleCRF's implementation from: https://github.com/HiLab-git/SimpleCRF

I have setup a test script to replicate this here: https://gist.github.com/masadcv/84f1bc9f505056ea8f4290d14a002d2a

It also seems the case that the MONAI's implementation takes significantly more memory on CPU as compared to SimpleCRF. Not sure if that is expected, but may be worth investigating if possible.

To Reproduce
Steps to reproduce the behavior:

  1. Download test script from: https://gist.github.com/masadcv/84f1bc9f505056ea8f4290d14a002d2a
  2. Install MONAI with BUILD_MONAI=1 BUILD_MONAI=1 pip -q install git+https://github.com/Project-MONAI/MONAI#egg=monai
  3. Install other required packages pip install simplecrf nibabel wget
  4. Run commands 'python testscript.py'

Expected behavior
I expect the two implementations (MONAI CRF vs SimpleCRF) to be in the same/similar ballpark in terms of execution time. At the moment, MONAI's implementation seems orders of magnitude slower.

Environment

Ensuring you use the relevant python executable, please paste the output of:

python -c 'import monai; monai.config.print_debug_info()'
================================
Printing MONAI config...
================================
MONAI version: 0.5.2+67.g013186d
Numpy version: 1.20.3
Pytorch version: 1.8.1+cu102
MONAI flags: HAS_EXT = True, USE_COMPILED = False
MONAI rev id: 013186dd9d0408026c38b4c7a75ee34e031b13d1

Optional dependencies:
Pytorch Ignite version: NOT INSTALLED or UNKNOWN VERSION.
Nibabel version: 3.2.1
scikit-image version: NOT INSTALLED or UNKNOWN VERSION.
Pillow version: 8.2.0
Tensorboard version: NOT INSTALLED or UNKNOWN VERSION.
gdown version: NOT INSTALLED or UNKNOWN VERSION.
TorchVision version: 0.9.1+cu102
ITK version: NOT INSTALLED or UNKNOWN VERSION.
tqdm version: NOT INSTALLED or UNKNOWN VERSION.
lmdb version: NOT INSTALLED or UNKNOWN VERSION.
psutil version: NOT INSTALLED or UNKNOWN VERSION.

For details about installing the optional dependencies, please visit:
    https://docs.monai.io/en/latest/installation.html#installing-the-recommended-dependencies


================================
Printing system config...
================================
`psutil` required for `print_system_info`

================================
Printing GPU config...
================================
Num GPUs: 1
Has CUDA: True
CUDA version: 10.2
cuDNN enabled: True
cuDNN version: 7605
Current device: 0
Library compiled for CUDA architectures: ['sm_37', 'sm_50', 'sm_60', 'sm_70']
GPU 0 Name: Quadro RTX 3000
GPU 0 Is integrated: False
GPU 0 Is multi GPU board: False
GPU 0 Multi processor count: 30
GPU 0 Total memory (GB): 5.8
GPU 0 CUDA capability (maj.min): 7.5

cc: @charliebudd @tvercaut

@charliebudd
Copy link
Collaborator

There's a quick fix I've been meaning to commit and then there's optimisation of the PHL message passing which is a longer job I'm working on in the background. Optimisation has mainly been focused on the GPU implementation with the CPU as a fall back, but I think it is reasonable to expect both be at a high performance.

@tvercaut
Copy link
Member

Naive first question but is the c++ code compiled with optimisation on? I can't see anything like -O2 or -O3 in setup.py but I guess it may come from elsewhere.

@charliebudd
Copy link
Collaborator

We compile the C++ extention with torch's setup tools wrapper. I believe this handles these things, off the top of my head I think its -O2. This PR #2261 implements the quick fix I aluded to earlier. While I have not tested it against Simple CRF, it does now run at the same order of magnitude as the crf as rnn implementation, and produce identical (by eye) results. When the JIT system is in I'll move the PHL over to there and make my optimisations. The main one I've planned is to seperate the constuction of the lattice from the application of it. As the CRF iterates over the same PHL filter with the same features, this will mean we only need to construct it once.

@tvercaut
Copy link
Member

Nice. I guess your comparison agains crf-as-rnn is in 2D. I guess crf-as-rnn works on 2D only out of the box, right?
It would be worth checking against SimpleCRF in 3D especially.

I expect the runtime of SimpleCRF in 2D to be similar to crf-as-rnn.

The crf-as-rnn implementation use the code from Philipp Krähenbühl for the PHL:
https://github.com/sadeepj/crfasrnn_pytorch/blob/master/crfasrnn/permutohedral.h
but does the outer loop in python.

In 2D, SimpleCRF wraps the entire CRF code from Philipp Krähenbühl which includes the same PHL code:
https://github.com/HiLab-git/SimpleCRF/tree/master/dependency/densecrf

In 3D, SimpleCRF warps Kostas Kamnitsas's extention of Philipp Krähenbühl CRF code:
https://github.com/HiLab-git/SimpleCRF/tree/master/dependency/densecrf3d

@Nic-Ma Nic-Ma added the question Further information is requested label May 29, 2021
@vikashg
Copy link

vikashg commented Jan 5, 2024

closing because of inactivity

@vikashg vikashg closed this as completed Jan 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants