From 8ef8770784ebcc7cb5bc807e0843aaf9d6228e6d Mon Sep 17 00:00:00 2001
From: YunLiu <55491388+KumoLiu@users.noreply.github.com>
Date: Thu, 10 Oct 2024 18:23:16 +0800
Subject: [PATCH] Enhance README in vista2d and pathology tumor detection
 (#695)

- enhance description for the dependency in vista2d
- add notes for random nncl timout issue in tumor detection bundle

### Status
**Ready**

### Please ensure all the checkboxes:
<!--- Put an `x` in all the boxes that apply, and remove the not
applicable items -->
- [x] Codeformat tests passed locally by running `./runtests.sh
--codeformat`.
- [ ] In-line docstrings updated.
- [ ] Update `version` and `changelog` in `metadata.json` if changing an
existing bundle.
- [ ] Please ensure the naming rules in config files meet our
requirements (please refer to: `CONTRIBUTING.md`).
- [ ] Ensure versions of packages such as `monai`, `pytorch` and `numpy`
are correct in `metadata.json`.
- [ ] Descriptions should be consistent with the content, such as
`eval_metrics` of the provided weights and TorchScript modules.
- [ ] Files larger than 25MB are excluded and replaced by providing
download links in `large_file.yml`.
- [ ] Avoid using path that contains personal information within config
files (such as use `/home/your_name/` for `"bundle_root"`).

---------

Signed-off-by: YunLiu <55491388+KumoLiu@users.noreply.github.com>
---
 .../configs/metadata.json                          |  3 ++-
 .../pathology_tumor_detection/configs/train.json   |  4 ++--
 models/pathology_tumor_detection/docs/README.md    |  8 ++++++++
 models/vista2d/configs/metadata.json               |  3 ++-
 models/vista2d/docs/README.md                      | 14 ++++++++++----
 5 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/models/pathology_tumor_detection/configs/metadata.json b/models/pathology_tumor_detection/configs/metadata.json
index e3c7cfa0..32f7cab6 100644
--- a/models/pathology_tumor_detection/configs/metadata.json
+++ b/models/pathology_tumor_detection/configs/metadata.json
@@ -1,7 +1,8 @@
 {
     "schema": "https://github.com/Project-MONAI/MONAI-extra-test-data/releases/download/0.8.1/meta_schema_20220324.json",
-    "version": "0.6.1",
+    "version": "0.6.2",
     "changelog": {
+        "0.6.2": "enhance readme for nccl timout issue",
         "0.6.1": "fix multi-gpu issue",
         "0.6.0": "use monai 1.4 and update large files",
         "0.5.9": "update to use monai 1.3.1",
diff --git a/models/pathology_tumor_detection/configs/train.json b/models/pathology_tumor_detection/configs/train.json
index 52344bad..78dba67f 100644
--- a/models/pathology_tumor_detection/configs/train.json
+++ b/models/pathology_tumor_detection/configs/train.json
@@ -174,7 +174,7 @@
             "_target_": "DataLoader",
             "dataset": "@train#dataset",
             "batch_size": 128,
-            "pin_memory": true,
+            "pin_memory": false,
             "num_workers": 8
         },
         "inferer": {
@@ -325,7 +325,7 @@
             "_target_": "DataLoader",
             "dataset": "@validate#dataset",
             "batch_size": 128,
-            "pin_memory": true,
+            "pin_memory": false,
             "shuffle": false,
             "num_workers": 8
         },
diff --git a/models/pathology_tumor_detection/docs/README.md b/models/pathology_tumor_detection/docs/README.md
index 46cc761a..cf04c829 100644
--- a/models/pathology_tumor_detection/docs/README.md
+++ b/models/pathology_tumor_detection/docs/README.md
@@ -135,6 +135,14 @@ torchrun --standalone --nnodes=1 --nproc_per_node=2 -m monai.bundle run --config
 
 Please note that the distributed training-related options depend on the actual running environment; thus, users may need to remove `--standalone`, modify `--nnodes`, or do some other necessary changes according to the machine used. For more details, please refer to [pytorch's official tutorial](https://pytorch.org/tutorials/intermediate/ddp_tutorial.html).
 
+**Note:** When using a container based on [PyTorch 24.0x](https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes), you may encounter random NCCL timeout errors. To address this issue, consider the following adjustments:
+
+- Reduce the `num_workers`: Decreasing the number of data loader workers can help minimize these errors.
+- Set `pin_memory` to `False`: Disabling pinned memory may reduce the likelihood of timeouts.
+- Switch to the `gloo` backend: As a workaround, you can set the distributed training backend to `gloo` to avoid NCCL-related timeouts.
+
+You can implement these settings by adding flags like `--train#dataloader#num_workers 0` or `--train#dataloader#pin_memory false`.
+
 #### Execute inference
 
 ```
diff --git a/models/vista2d/configs/metadata.json b/models/vista2d/configs/metadata.json
index a1cba412..cd5ae1dc 100644
--- a/models/vista2d/configs/metadata.json
+++ b/models/vista2d/configs/metadata.json
@@ -1,7 +1,8 @@
 {
     "schema": "https://github.com/Project-MONAI/MONAI-extra-test-data/releases/download/0.8.1/meta_schema_20240725.json",
-    "version": "0.2.9",
+    "version": "0.3.0",
     "changelog": {
+        "0.3.0": "update readme",
         "0.2.9": "fix unsupported data dtype in findContours",
         "0.2.8": "remove relative path in readme",
         "0.2.7": "enhance readme",
diff --git a/models/vista2d/docs/README.md b/models/vista2d/docs/README.md
index 4ff5cfa1..afcdb12a 100644
--- a/models/vista2d/docs/README.md
+++ b/models/vista2d/docs/README.md
@@ -52,7 +52,12 @@ The default dataset for training, validation, and inference is the [Cellpose](ht
 Additionally, all data lists are available in the `datalists.zip` file located in the root directory of the bundle. Extract the contents of the `.zip` file to access the data lists.
 
 ### Dependencies
-Please refer to `required_packages_version` in `configs/metadata.json` to install all necessary dependencies before executing.
+Please refer to the `required_packages_version` section in `configs/metadata.json` to install all necessary dependencies before execution. If you’re using the MONAI container, you can simply run the commands below and ignore any "opencv-python-headless not installed" error message, as this package is already included in the container.
+
+```
+pip install fastremap==1.15.0 roifile==2024.5.24 natsort==8.4.0
+pip install --no-deps cellpose
+```
 
 Important Note: if your environment already contains OpenCV, installing `cellpose` may lead to conflicts and produce errors such as:
 
@@ -60,13 +65,14 @@ Important Note: if your environment already contains OpenCV, installing `cellpos
 AttributeError: partially initialized module 'cv2' has no attribute 'dnn' (most likely due to a circular import)
 ```
 
-when executing. To resolve this issue, please uninstall OpenCV and then re-install `cellpose` with a command like:
+To resolve this, uninstall `OpenCV` first, and then install `cellpose` using the following commands:
 
 ```Bash
-pip uninstall -y opencv && rm /usr/local/lib/python3.x/dist-packages/cv2
+pip uninstall -y opencv && rm /usr/local/lib/python3.*/dist-packages/cv2
 ```
+Make sure to replace 3.* with your actual Python version (e.g., 3.10).
 
-Alternatively, you can use the following command to install `cellpose` without its dependencies:
+Alternatively, you can install `cellpose` without its dependencies to avoid potential conflicts:
 
 ```
 pip install --no-deps cellpose