Merge branch 'NVIDIA:main' into fix_filter_job

NVIDIA · Feb 7, 2025 · dacb356 · dacb356
2 parents fb0d79e + 9bb23a3
commit dacb356
Show file tree

Hide file tree

Showing 40 changed files with 2,718 additions and 661 deletions.
diff --git a/examples/advanced/nlp-ner/README.md b/examples/advanced/nlp-ner/README.md
@@ -22,11 +22,13 @@ pip install -r ./requirements.txt
 
 The raw data can be accessed from [official page](https://www.ncbi.nlm.nih.gov/CBBresearch/Dogan/DISEASE/). 
 In this example, we use the preprocessed csv-files from the reference repo above, which can be downloaded [here](https://drive.google.com/drive/folders/13wROtEAnMgWpLMIGHB5CY1BQ1Xe2XqhG). Please download three files `train.csv`, `dev.csv`, and `test.csv`.
+In the following, we assume the downloaded files are placed in a folder `DATASET_ROOT`, and we default to `/tmp/nvflare/data/nlp_ner`
 
 We then use the preprocessed data to generate random splits for both 4-client and 2-client experiments. 
 Please modify the `DATASET_ROOT` below to point to folder containing the four downloaded csv-files.
 ```commandline
-bash prepare_data.sh DATASET_ROOT
+DATASET_ROOT=/tmp/nvflare/data/nlp_ner
+bash prepare_data.sh $DATASET_ROOT
 ```
 The expected output is
 ```
@@ -52,31 +54,14 @@ Let's take a closer look at the word-label correspondence:
 As shown above, the task is to capture the keywords related to medical findings.
 
 ## Run automated experiments
-We use the NVFlare [simulator](https://nvflare.readthedocs.io/en/latest/user_guide/nvflare_cli/fl_simulator.html) to run the FL training.
-Set `PYTHONPATH` to include custom files of this example:
+We run the federated training on a single client using NVFlare Simulator via [JobAPI](https://nvflare.readthedocs.io/en/main/programming_guide/fed_job_api.html).
 ```
-export PYTHONPATH=${PWD}
-```
-### Prepare local configs
-Please modify the `DATASET_ROOT` within [config_fed_client.json](./jobs/bert_ncbi/app/config/config_fed_client.json)
-### Use NVFlare simulator to run the experiments
-We use the NVFlare simulator to run the FL training experiments, following the pattern:
+python3 nlp_fl_job.py --model_name Bert 
+python3 nlp_fl_job.py --model_name GPT
 ```
-nvflare simulator jobs/[job] -w ${workspace_path}/[job] -c [clients] -gpu [gpu] -t [thread]
-```
-`[job]` is the experiment job that will be submitted for the FL training. 
-In this example, it will be `bert_ncbi` and `gpt2_ncbi`.  
-The combination of `-c` and `-gpu`/`-t` controls the resource allocation. 
 
 ## Results
-In this example, we run 4 clients on 2 GPUs with 4 threads for BERT model, and 2 clients on 2 GPUs with 2 threads for GPT-2 model. The minimum GPU memory requirement is 10 GB per GPU for BERT and 8 GB per GPU for GPT-2. We put the workspace in `/tmp` folder
-```
-nvflare simulator jobs/bert_ncbi -w /tmp/nvflare/workspaces/bert_ncbi -n 4 -gpu 0,1,0,1
-```
-and 
-```
-nvflare simulator jobs/gpt2_ncbi -w /tmp/nvflare/workspaces/gpt2_ncbi -n 2 -gpu 0,1
-```
+In this example, we run 4 clients for BERT model, and 2 clients for GPT-2 model. 
 
 ### Validation curve on each site
 In this example, each client computes their validation scores using their own
@@ -94,27 +79,29 @@ The testing score is computed for the global model over the testing set.
 We provide a script for performing validation on testing data. 
 Please modify the `DATASET_ROOT` below:
 ```
-bash test_global_model.sh DATASET_ROOT
+DATASET_ROOT=/tmp/nvflare/data/nlp_ner
+export PYTHONPATH=${PWD}
+bash test_global_model.sh ${DATASET_ROOT}
 ```
 The test results are:
 ```
 BERT
               precision    recall  f1-score   support
 
-           _       0.96      0.97      0.97      1255
+           _       0.96      0.98      0.97      1255
 
-   micro avg       0.96      0.97      0.97      1255
-   macro avg       0.96      0.97      0.97      1255
-weighted avg       0.96      0.97      0.97      1255
+   micro avg       0.96      0.98      0.97      1255
+   macro avg       0.96      0.98      0.97      1255
+weighted avg       0.96      0.98      0.97      1255
 
 GPT-2
               precision    recall  f1-score   support
 
-           _       0.86      0.90      0.88      1255
+           _       0.87      0.90      0.88      1255
 
-   micro avg       0.86      0.90      0.88      1255
-   macro avg       0.86      0.90      0.88      1255
-weighted avg       0.86      0.90      0.88      1255
+   micro avg       0.87      0.90      0.88      1255
+   macro avg       0.87      0.90      0.88      1255
+weighted avg       0.87      0.90      0.88      1255
 
 ```
 Note that training is not deterministic so the numbers can have some variations.