diff --git a/examples/advanced/sklearn-kmeans/sklearn_kmeans_iris.ipynb b/examples/advanced/sklearn-kmeans/sklearn_kmeans_iris.ipynb
index fe5383f993..4076ac52cd 100644
--- a/examples/advanced/sklearn-kmeans/sklearn_kmeans_iris.ipynb
+++ b/examples/advanced/sklearn-kmeans/sklearn_kmeans_iris.ipynb
@@ -103,7 +103,7 @@
    "id": "bd0713e2-e393-41c0-9da0-392535cf8a54",
    "metadata": {},
    "source": [
-    "## 4. Run simulated kmeans experiment\n",
+    "## 3. Run simulated kmeans experiment\n",
     "We run the federated training using NVFlare Simulator via [JobAPI](https://nvflare.readthedocs.io/en/main/programming_guide/fed_job_api.html):"
    ]
   },
@@ -124,7 +124,7 @@
    "id": "913e9ee2-e993-442d-a525-d2baf92af539",
    "metadata": {},
    "source": [
-    "## 5. Result visualization\n",
+    "## 4. Result visualization\n",
     "Model accuracy is computed as the homogeneity score between the cluster formed and the ground truth label, which can be visualized in tensorboard."
    ]
   },
@@ -140,14 +140,6 @@
     "%load_ext tensorboard\n",
     "%tensorboard --logdir /tmp/nvflare/workspace/works/kmeans/sklearn_kmeans_uniform_3_clients"
    ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "bea9ebcd-96f5-45c8-a490-0559fab9991f",
-   "metadata": {},
-   "outputs": [],
-   "source": []
   }
  ],
  "metadata": {
diff --git a/examples/advanced/vertical_xgboost/README.md b/examples/advanced/vertical_xgboost/README.md
deleted file mode 100644
index bddf82f2a6..0000000000
--- a/examples/advanced/vertical_xgboost/README.md
+++ /dev/null
@@ -1,103 +0,0 @@
-# Vertical Federated XGBoost
-This example shows how to use vertical federated learning with [NVIDIA FLARE](https://nvflare.readthedocs.io/en/main/index.html) on tabular data.
-Here we use the optimized gradient boosting library [XGBoost](https://github.com/dmlc/xgboost) and leverage its federated learning support.
-
-Before starting please make sure you set up a [virtual environment](../../README.md#set-up-a-virtual-environment) and install the additional requirements:
-```
-python3 -m pip install -r requirements.txt
-```
-
-## Preparing HIGGS Data
-In this example we showcase a binary classification task based on the [HIGGS dataset](https://mlphysics.ics.uci.edu/data/higgs/), which contains 11 million instances, each with 28 features and 1 class label.
-
-### Download and Store Dataset
-First download the dataset from the HIGGS link above, which is a single zipped `.csv` file.
-By default, we assume the dataset is downloaded, uncompressed, and stored in `DATASET_ROOT/HIGGS.csv`.
-
-### Vertical Data Splits
-In vertical federated learning, sites share overlapping data samples (rows), but contain different features (columns).
-In order to achieve this, we split the HIGGS dataset both horizontally and vertically. As a result, each site has an overlapping subset of the rows and a  subset of the 29 columns. Since the first column of HIGGS is the class label, we give site-1 the label column for simplicity's sake.
-
-<img src="./figs/vertical_fl.png" alt="vertical fl diagram" width="500"/>
-
-Run the following command to prepare the data splits:
-```
-./prepare_data.sh DATASET_ROOT
-```
-> **_NOTE:_** make sure to put the correct path for `DATASET_ROOT`.
-
-### Private Set Intersection (PSI)
-Since not every site will have the same set of data samples (rows), we can use PSI to compare encrypted versions of the sites' datasets in order to jointly compute the intersection based on common IDs. In this example, the HIGGS dataset does not contain unique identifiers so we add a temporary `uid_{idx}` to each instance and give each site a portion of the HIGGS dataset that includes a common overlap. Afterwards the identifiers are dropped since they are only used for matching, and training is then done on the intersected data. To learn more about our PSI protocol implementation, see our [psi example](../psi/README.md).
-
-> **_NOTE:_** The uid can be a composition of multiple variables with a transformation, however in this example we use indices for simplicity. PSI can also be used for computing the intersection of overlapping features, but here we give each site unique features.
-
-Create the psi job using the predefined psi_csv template:
-```
-nvflare job create -j ./jobs/vertical_xgb_psi -w psi_csv -sd ./code/psi -force
-```
-
-Run the psi job to calculate the dataset intersection of the clients at `psi/intersection.txt` inside the psi workspace:
-```
-nvflare simulator ./jobs/vertical_xgb_psi -w /tmp/nvflare/vertical_xgb_psi -n 2 -t 2
-```
-
-## Vertical XGBoost Federated Learning with FLARE
-
-This Vertical XGBoost example leverages the recently added [vertical federated learning support](https://github.com/dmlc/xgboost/issues/8424) in the XGBoost open-source library. This allows for the distributed XGBoost algorithm to operate in a federated manner on vertically split data.
-
-For integrating with FLARE, we can use the predefined `XGBFedController` to run the federated server and control the workflow.
-
-Next, we can use `FedXGBHistogramExecutor` and set XGBoost training parameters in `config_fed_client.json`, or define new training logic by overwriting the `xgb_train()` method.
-
-Lastly, we must subclass `XGBDataLoader` and implement the `load_data()` method. For vertical federated learning, it is important when creating the `xgb.Dmatrix` to set `data_split_mode=1` for column mode, and to specify the presence of a label column `?format=csv&label_column=0` for the csv file. To support PSI, the dataloader can also read in the dataset based on the calculated intersection, and split the data into training and validation.
-
-> **_NOTE:_** For secure mode, make sure to provide the required certificates for the federated communicator.
-
-## Run the Example
-Create the vertical xgboost job using the predefined vertical_xgb template:
-```
-nvflare job create -j ./jobs/vertical_xgb -w vertical_xgb -sd ./code/vertical_xgb -force
-```
-
-Run the vertical xgboost job:
-```
-nvflare simulator ./jobs/vertical_xgb -w /tmp/nvflare/vertical_xgb -n 2 -t 2
-```
-
-The model will be saved to `test.model.json`.
-
-(Feel free to modify the scripts and jobs as desired to change arguments such as number of clients, dataset sizes, training params, etc.)
-
-### GPU Support
-By default, CPU based training is used.
-
-In order to enable GPU accelerated training, first ensure that your machine has CUDA installed and has at least one GPU.
-In `config_fed_client.json` set `"use_gpus": true` and  `"tree_method": "hist"` in `xgb_params`.
-Then, in `FedXGBHistogramExecutor` we can use the `device` parameter to map each rank to a GPU device ordinal in `xgb_params`.
-If using multiple GPUs, we can map each rank to a different GPU device, however you can also map each rank to the same GPU device if using a single GPU.
-
-We can create a GPU enabled job using the job CLI:
-```
-nvflare job create -j ./jobs/vertical_xgb_gpu -w vertical_xgb \
--f config_fed_client.conf \
--f config_fed_server.conf use_gpus=true \
--sd ./code/vertical_xgb \
--force
-```
-
-This job can be run:
-```
-nvflare simulator ./jobs/vertical_xgb_gpu -w /tmp/nvflare/vertical_xgb_gpu -n 2 -t 2
-```
-
-## Results
-Model accuracy can be visualized in tensorboard:
-```
-tensorboard --logdir /tmp/nvflare/vertical_xgb/server/simulate_job/tb_events
-```
-
-An example training (pink) and validation (orange) AUC graph from running vertical XGBoost on HIGGS:
-(Used an intersection of 50000 samples across 5 clients each with different features,
-and ran for ~50 rounds due to early stopping.)
-
-![Vertical XGBoost graph](./figs/vertical_xgboost_graph.png)
diff --git a/examples/advanced/vertical_xgboost/figs/vertical_xgboost_graph.png b/examples/advanced/vertical_xgboost/figs/vertical_xgboost_graph.png
deleted file mode 100644
index 56e7f2c03c..0000000000
Binary files a/examples/advanced/vertical_xgboost/figs/vertical_xgboost_graph.png and /dev/null differ
diff --git a/examples/advanced/vertical_xgboost/prepare_data.sh b/examples/advanced/vertical_xgboost/prepare_data.sh
deleted file mode 100755
index d938e0c4eb..0000000000
--- a/examples/advanced/vertical_xgboost/prepare_data.sh
+++ /dev/null
@@ -1,23 +0,0 @@
-#!/usr/bin/env bash
-DATASET_PATH="${1}/HIGGS.csv"
-OUTPUT_PATH="/tmp/nvflare/vertical_xgb_data"
-OUTPUT_FILE="higgs.data.csv"
-
-if [ ! -f "${DATASET_PATH}" ]
-then
-    echo "Please check if you saved HIGGS dataset in ${DATASET_PATH}"
-fi
-
-echo "Generating HIGGS data splits, reading from ${DATASET_PATH}"
-
-python3 utils/prepare_data.py \
---data_path "${DATASET_PATH}" \
---site_num 2 \
---rows_total_percentage 0.02 \
---rows_overlap_percentage 0.25 \
---out_path "${OUTPUT_PATH}" \
---out_file "${OUTPUT_FILE}"
-
-# Note: HIGGS has 11000000 preshuffled instances; using rows_total_percentage to reduce PSI time for example
-
-echo "Data splits are generated in ${OUTPUT_PATH}"
diff --git a/examples/advanced/vertical_xgboost/requirements.txt b/examples/advanced/vertical_xgboost/requirements.txt
deleted file mode 100644
index b0d1cd29e7..0000000000
--- a/examples/advanced/vertical_xgboost/requirements.txt
+++ /dev/null
@@ -1,7 +0,0 @@
-nvflare~=2.5.0rc
-openmined.psi==1.1.1
-pandas
-torch
-tensorboard
-# require xgboost 2.2 version, for now need to install a nightly build
-https://s3-us-west-2.amazonaws.com/xgboost-nightly-builds/federated-secure/xgboost-2.2.0.dev0%2B4601688195708f7c31fcceeb0e0ac735e7311e61-py3-none-manylinux_2_28_x86_64.whl
diff --git a/examples/advanced/xgboost/README.md b/examples/advanced/xgboost/README.md
index 207e94334b..9c9844d217 100644
--- a/examples/advanced/xgboost/README.md
+++ b/examples/advanced/xgboost/README.md
@@ -1,220 +1,54 @@
 # Federated Learning for XGBoost 
-
-Please make sure you set up virtual environment and Jupyterlab follows [example root readme](../../README.md)
-
-## Introduction to XGBoost and HIGGS Data
-
-You can also follow along in this [notebook](./data_job_setup.ipynb) for an interactive experience.
-
-### XGBoost
-These examples show how to use [NVIDIA FLARE](https://nvflare.readthedocs.io/en/main/index.html) on tabular data applications.
-They use [XGBoost](https://github.com/dmlc/xgboost),
-which is an optimized distributed gradient boosting library.
-
-### HIGGS
-The examples illustrate a binary classification task based on [HIGGS dataset](https://mlphysics.ics.uci.edu/data/higgs/).
-This dataset contains 11 million instances, each with 28 attributes.
-
-Please note that the UCI's website may experience occasional downtime.
-
-## Federated Training of XGBoost
-Several mechanisms have been proposed for training an XGBoost model in a federated learning setting.
-In these examples, we illustrate the use of NVFlare to carry out *horizontal* federated learning using two approaches: histogram-based collaboration and tree-based collaboration.
-
-### Horizontal Federated Learning
-Under horizontal setting, each participant / client joining the federated learning will have part of the whole data / instances / examples/ records, while each instance has all the features.
-This is in contrast to vertical federated learning, where each client has part of the feature values for each instance.
-
-#### Histogram-based Collaboration
-The histogram-based collaboration federated XGBoost approach leverages NVFlare integration of recently added [federated learning support](https://github.com/dmlc/xgboost/issues/7778) in the XGBoost open-source library,
-which allows the existing *distributed* XGBoost training algorithm to operate in a federated manner,
-with the federated clients acting as the distinct workers in the distributed XGBoost algorithm.
-
-In distributed XGBoost, the individual workers share and aggregate coarse information about their respective portions of the training data,
-as required to optimize tree node splitting when building the successive boosted trees.
-
-The shared information is in the form of quantile sketches of feature values as well as corresponding sample gradient and sample Hessian histograms.
-
-Under federated histogram-based collaboration, precisely the same information is exchanged among the clients.
-
-The main differences are that the data is partitioned across the workers according to client data ownership, rather than being arbitrarily partionable, and all communication is via an aggregating federated [gRPC](https://grpc.io) server instead of direct client-to-client communication.
-
-Histograms from different clients, in particular, are aggregated in the server and then communicated back to the clients.
-
-See [histogram-based/README](histogram-based/README.md) for more information on the histogram-based collaboration.
-
-#### Tree-based Collaboration
-Under tree-based collaboration, individual trees are independently trained on each client's local data without aggregating the global sample gradient histogram information.
-Trained trees are collected and passed to the server / other clients for aggregation and further boosting rounds.
-
-The XGBoost Booster api is leveraged to create in-memory Booster objects that persist across rounds to cache predictions from trees added in previous rounds and retain other data structures needed for training.
-
-See [tree-based/README](tree-based/README.md) for more information on two different types of tree-based collaboration algorithms.
-
-
-## HIGGS Data Preparation
-For data preparation, you can follow this [notebook](./data_job_setup.ipynb):
-### Download and Store Data
-To run the examples, we first download the dataset from the HIGGS link above, which is a single `.csv` file.
-By default, we assume the dataset is downloaded, uncompressed, and stored in `~/dataset/HIGGS.csv`.
-
-> **_NOTE:_** If the dataset is downloaded in another place,
-> make sure to modify the corresponding `DATASET_PATH` inside `prepare_data.sh`.
-
-### Data Split
-Since HIGGS dataset is already randomly recorded,
-data split will be specified by the continuous index ranges for each client,
-rather than a vector of random instance indices.
-We provide four options to split the dataset to simulate the non-uniformity in data quantity: 
-
-1. uniform: all clients has the same amount of data 
-2. linear: the amount of data is linearly correlated with the client ID (1 to M)
-3. square: the amount of data is correlated with the client ID in a squared fashion (1^2 to M^2)
-4. exponential: the amount of data is correlated with the client ID in an exponential fashion (exp(1) to exp(M))
-
-The choice of data split depends on dataset and the number of participants.
-
-For a large dataset like HIGGS, if the number of clients is small (e.g. 5),
-each client will still have sufficient data to train on with uniform split,
-and hence exponential would be used to observe the performance drop caused by non-uniform data split.
-If the number of clients is large (e.g. 20), exponential split will be too aggressive, and linear/square should be used.
-
-Data splits used in this example can be generated with
-```
-bash prepare_data.sh
-```
-
-This will generate data splits for three client sizes: 2, 5 and 20, and 3 split conditions: uniform, square, and exponential.
-If you want to customize for your experiments, please check `utils/prepare_data_split.py`.
-
-> **_NOTE:_** The generated train config files will be stored in the folder `/tmp/nvflare/xgboost_higgs_dataset/`,
-> and will be used by jobs by specifying the path within `config_fed_client.json` 
-
-
-## HIGGS job configs preparation under various training schemes
-
-Please follow the [Installation](../../getting_started/README.md) instructions to install NVFlare.
-
-We then prepare the NVFlare job configs for different settings by running
+This example demonstrates how to use NVFlare to train an XGBoost model in a federated learning setting. 
+Several potential variations of federated XGBoost are illustrated, including:
+- non-secure horizontal collaboration with histogram-based and tree-based mechanisms.
+- non-secure vertical collaboration with histogram-based mechanism.
+- secure horizontal and vertical collaboration with histogram-based mechanism and homomorphic encryption.
+
+To run the examples and notebooks, please make sure you set up a virtual environment and Jupyterlab, following [the example root readme](../../README.md)
+and install the additional requirements:
 ```
-bash prepare_job_config.sh
+python3 -m pip install -r requirements.txt
 ```
 
-This script modifies settings from base job configuration
-(`./tree-based/jobs/bagging_base` or `./tree-based/jobs/cyclic_base`
-or `./histogram-based/jobs/base`),
-and copies the correct data split file generated in the data preparation step.
-
-> **_NOTE:_** To customize your own job configs, you can just edit from the generated ones.
-> Or check the code in `./utils/prepare_job_config.py`.
-
-The script will generate a total of 10 different configs in `tree-based/jobs` for tree-based algorithm:
+## XGBoost 
+XGBosot is a machine learning algorithm that uses decision/regression trees to perform classification and regression tasks, 
+mapping a vector of feature values to its label prediction. It is especially powerful for tabular data, so even in the age of LLM, 
+it still widely used for many tabular data use cases. It is also preferred for its explainability and efficiency.
 
-- tree-based cyclic training with uniform data split for 5 clients
-- tree-based cyclic training with non-uniform data split for 5 clients
-- tree-based bagging training with uniform data split and uniform shrinkage for 5 clients
-- tree-based bagging training with non-uniform data split and uniform shrinkage for 5 clients
-- tree-based bagging training with non-uniform data split and scaled shrinkage for 5 clients
-- tree-based cyclic training with uniform data split for 20 clients
-- tree-based cyclic training with non-uniform data split for 20 clients
-- tree-based bagging training with uniform data split and uniform shrinkage for 20 clients
-- tree-based bagging training with non-uniform data split and uniform shrinkage for 20 clients
-- tree-based bagging training with non-uniform data split and scaled shrinkage for 20 clients
+In these examples, we use [DMLC XGBoost](https://github.com/dmlc/xgboost), which is an optimized distributed gradient boosting library. 
+It offers advanced features like GPU accelerated capabilities, and distributed/federated learning support.
 
+## Data 
+We use two datasets: [HIGGS](https://mlphysics.ics.uci.edu/data/higgs/) and [creditcardfraud](https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud)
+to perform the experiments, both of them are binary classification task, but of significantly different scales:
+HIGGS dataset contains 11 million instances, each with 28 attributes; while creditcardfraud dataset contains 284,807 instances, each with 30 attributes.
 
-The script will also generate 2 configs in `histogram-based/jobs` for histogram-base algorithm:
+We use the HIGGS dataset to compare the performance of different federated learning settings for its large scale; 
+and the creditcardfraud dataset to demonstrate the secure federated learning with homomorphic encryption for computation efficiency.
+Please note that the websites may experience occasional downtime.
 
-- histogram-based training with uniform data split for 2 clients
-- histogram-based training with uniform data split for 5 clients
+First download the dataset from the links above, which is a single zipped `HIGGS.csv.gz` file and a single `creditcard.csv` file.
+By default, we assume the dataset is downloaded, uncompressed, and stored in `DATASET_ROOT/HIGGS.csv` and `DATASET_ROOT/creditcard.csv`.
+Each row corresponds to a data sample, and each column corresponds to a feature. 
 
-## Run experiments for tree-based and histogram-based settings
-After you run the two scripts `prepare_data.sh` and `prepare_job_config.sh`,
-please go to sub-folder [tree-based](tree-based) for running tree-based algorithms,
-and sub-folder [histogram-based](histogram-based) for running histogram-based algorithms.
+## Collaboration Modes and Data Split
+Essentially there are two collaboration modes: horizontal and vertical:
+- In horizontal case, each participant has access to the same features (columns) of different data samples (rows). 
+In this case, everyone holds equal status as "label owner"
+- In vertical case, each client has access to different features (columns) of the same data samples (rows).
+We assume that only one is the "label owner" (or we call it as the "active party")
 
+To simulate the above two collaboration modes, we split the two datasets both horizontally and vertically, and 
+we give site-1 the label column for simplicity.
 
-## GPU support
-By default, CPU based training is used.
-
-If the CUDA is installed on the site, tree construction and prediction can be
-accelerated using GPUs.
-
-To enable GPU accelerated training, in `config_fed_client.json` set `"use_gpus": true` and  `"tree_method": "hist"`.
-Then, in `FedXGBHistogramExecutor` we use the `device` parameter to map each rank to a GPU device ordinal in `xgb_params`.
-For a single GPU, assuming it has enough memory, we can map each rank to the same device with `params["device"] = f"cuda:0"`.
-
-### Multi GPU support
-
-Multiple GPUs can be supported by running one NVFlare client for each GPU.
-
-In the `xgb_params`, we can set the `device` parameter to map each rank to a corresponding GPU device ordinal in with `params["device"] = f"cuda:{self.rank}"`
-
-Assuming there are 2 physical client sites, each with 2 GPUs (id 0 and 1).
-We can start 4 NVFlare client processes (site-1a, site-1b, site-2a, site-2b), one for each GPU.
-The job layout looks like this,
-::
-
-    xgb_multi_gpu_job
-    ├── app_server
-    │   └── config
-    │       └── config_fed_server.json
-    ├── app_site1_gpu0
-    │   └── config
-    │       └── config_fed_client.json
-    ├── app_site1_gpu1
-    │   └── config
-    │       └── config_fed_client.json
-    ├── app_site2_gpu0
-    │   └── config
-    │       └── config_fed_client.json
-    ├── app_site2_gpu1
-    │   └── config
-    │       └── config_fed_client.json
-    └── meta.json
-
-Each app is deployed to its own client site. Here is the `meta.json`,
-::
-
-    {
-      "name": "xgb_multi_gpu_job",
-      "resource_spec": {
-        "site-1a": {
-          "num_of_gpus": 1,
-          "mem_per_gpu_in_GiB": 1
-        },
-        "site-1b": {
-          "num_of_gpus": 1,
-          "mem_per_gpu_in_GiB": 1
-        },
-        "site-2a": {
-          "num_of_gpus": 1,
-          "mem_per_gpu_in_GiB": 1
-        },
-        "site-2b": {
-          "num_of_gpus": 1,
-          "mem_per_gpu_in_GiB": 1
-        }
-      },
-      "deploy_map": {
-        "app_server": [
-          "server"
-        ],
-        "app_site1_gpu0": [
-          "site-1a"
-        ],
-        "app_site1_gpu1": [
-          "site-1b"
-        ],
-        "app_site2_gpu0": [
-          "site-2a"
-        ],
-        "app_site2_gpu1": [
-          "site-2b"
-        ]
-      },
-      "min_clients": 4
-    }
-
-For federated XGBoost, all clients must participate in the training. Therefore,
-`min_clients` must equal to the number of clients.
-
+## Federated Training of XGBoost
+Continue with this example for two scenarios:
+### [Federated XGBoost without Encryption](./fedxgb/README.md)
+This example includes instructions on running federated XGBoost without encryption under histogram-based and tree-based horizontal 
+collaboration, and histogram-based vertical collaboration.
+
+### [Secure Federated XGBoost with Homomorphic Encryption](./fedxgb_secure/README.md)
+This example includes instructions on running secure federated XGBoost with homomorphic encryption under 
+histogram-based horizontal and vertical collaboration. Note that tree-based collaboration does not have security concerns 
+that can be handled by encryption.
\ No newline at end of file
diff --git a/examples/advanced/xgboost/fedxgb/README.md b/examples/advanced/xgboost/fedxgb/README.md
new file mode 100644
index 0000000000..0eb70f7a38
--- /dev/null
+++ b/examples/advanced/xgboost/fedxgb/README.md
@@ -0,0 +1,212 @@
+# Federated XGBoost
+Several mechanisms have been proposed for training an XGBoost model in a federated learning setting.
+In these examples, we illustrate the use of NVFlare to carry out *horizontal* federated learning using two approaches: histogram-based collaboration and tree-based collaboration.
+And *vertical* federated learning using histogram-based collaboration.
+
+## Horizontal Federated XGBoost
+Under horizontal setting, each participant joining the federated learning will have part of 
+the whole data samples / instances / records, while each sample has all the features.
+
+### Histogram-based Collaboration
+The histogram-based collaboration federated XGBoost approach leverages NVFlare integration of [federated learning support](https://github.com/dmlc/xgboost/issues/7778) in the XGBoost open-source library,
+which allows the existing *distributed* XGBoost training algorithm to operate in a federated manner,
+with the federated clients acting as the distinct workers in the distributed XGBoost algorithm.
+
+In distributed XGBoost, the individual workers share and aggregate gradient information about their respective portions of the training data,
+as required to optimize tree node splitting when building the successive boosted trees.
+
+The shared information is in the form of quantile sketches of feature values as well as corresponding sample gradient and sample Hessian histograms.
+
+Under federated histogram-based collaboration, precisely the same information is exchanged among the clients.
+The main differences are that the data is partitioned across the workers according to client data ownership, rather than being arbitrarily partionable, and all communication is via an aggregating federated [gRPC](https://grpc.io) server instead of direct client-to-client communication.
+Histograms from different clients, in particular, are aggregated in the server and then communicated back to the clients.
+
+### Tree-based Collaboration
+Under tree-based collaboration, individual trees are independently trained on each client's local data without aggregating the global sample gradient histogram information.
+Trained trees are collected and passed to the server / other clients for aggregation and / or further boosting rounds.
+Under this setting, we can further distinguish between two types of tree-based collaboration: cyclic and bagging.
+
+#### Cyclic Training
+"Cyclic XGBoost" is one way of performing tree-based federated boosting with 
+multiple sites: at each round of tree boosting, instead of relying on the whole 
+data statistics collected from all clients, the boosting relies on only 1 client's 
+local data. The resulting tree sequence is then forwarded to the next client for 
+next round's boosting. Such training scheme have been proposed in literatures [1] [2].
+
+#### Bagging Aggregation
+
+"Bagging XGBoost" is another way of performing tree-based federated boosting with multiple sites: at each round of tree boosting, all sites start from the same "global model", and boost a number of trees (in current example, 1 tree) based on their local data. The resulting trees are then send to server. A bagging aggregation scheme is applied to all the submitted trees to update the global model, which is further distributed to all clients for next round's boosting. 
+
+This scheme bears certain similarity to the [Random Forest mode](https://xgboost.readthedocs.io/en/stable/tutorials/rf.html) of XGBoost, where a `num_parallel_tree` is boosted based on random row/col splits, rather than a single tree. Under federated learning setting, such split is fixed to clients rather than random and without column subsampling. 
+
+In addition to basic uniform shrinkage setting where all clients have the same learning rate, based on our research, we enabled scaled shrinkage across clients for weighted aggregation according to each client's data size, which is shown to significantly improve the model's performance on non-uniform quantity splits over HIGGS data.
+
+
+Specifically, the global model is updated by aggregating the trees from all clients as a forest, and the global model is then broadcasted back to all clients for local prediction and further training.
+
+The XGBoost Booster api is leveraged to create in-memory Booster objects that persist across rounds to cache predictions from trees added in previous rounds and retain other data structures needed for training.
+
+## Vertical Federated XGBoost
+Under vertical setting, each participant joining the federated learning will 
+have part of the whole features, while each site has all the overlapping instances.
+
+### Private Set Intersection (PSI)
+Since not every site will have the same set of data samples (rows), we will use PSI to first compare encrypted versions of the sites' datasets in order to jointly compute the intersection based on common IDs. In the following example, we add a `uid_{idx}` to each instance and give each site 
+a portion of the dataset that includes a common overlap. After PSI, the identifiers are dropped since they are only used for matching, and training is then done on the intersected data. To learn more about our PSI protocol implementation, see our [psi example](../../psi/README.md).
+> **_NOTE:_** The uid can be a composition of multiple variables with a transformation, however in this example we use indices for simplicity. PSI can also be used for computing the intersection of overlapping features, but here we give each site unique features.
+
+### Histogram-based Collaboration
+Similar to its horizontal counterpart, under vertical collaboration, histogram-based collaboration will 
+aggregate the gradient information from each site and update the global model accordingly, resulting in
+the same model as the centralized / histogram-based horizontal training. 
+We leverage the [vertical federated learning support](https://github.com/dmlc/xgboost/issues/8424) in the XGBoost open-source library. This allows for the distributed XGBoost algorithm to operate in a federated manner on vertically split data.
+
+## Data Preparation
+Assuming the HIGGS data has been downloaded following [the instructions](../README.md), we further split the data 
+horizontally and vertically for federated learning.
+
+In horizontal settings, each party holds different data samples with the same set of features.
+To simulate this, we split the HIGGS data by rows, and assigning each party a subset of the data samples.
+In vertical settings, each party holds different features of the same data samples, and usually, the population 
+on each site will not fully overlap. To simulate this, we split the HIGGS data by both columns and rows, each site
+will have different features with overlapping data samples.
+More details will be provided in the following sub-sections.
+ 
+
+Data splits used in this example can be generated with
+```
+DATASET_ROOT=~/.cache/dataset/HIGGS
+bash prepare_data.sh ${DATASET_ROOT}
+```
+Please modify the path according to your own dataset location. 
+The generated horizontal train config files and vertical data files will be stored in the 
+folder `/tmp/nvflare/dataset/`, this output path can be changed in the script `prepare_data.sh`.
+
+### Horizontal Data Split
+Since HIGGS dataset is already randomly recorded,
+horizontal data split will be specified by the continuous index ranges for each client,
+rather than a vector of random instance indices.
+We provide four options to split the dataset to simulate the non-uniformity in data quantity: 
+
+1. uniform: all clients has the same amount of data 
+2. linear: the amount of data is linearly correlated with the client ID (1 to M)
+3. square: the amount of data is correlated with the client ID in a squared fashion (1^2 to M^2)
+4. exponential: the amount of data is correlated with the client ID in an exponential fashion (exp(1) to exp(M))
+
+The choice of data split depends on dataset and the number of participants.
+
+For a large dataset like HIGGS, if the number of clients is small (e.g. 5),
+each client will still have sufficient data to train on with uniform split,
+and hence exponential would be used to observe the performance drop caused by non-uniform data split.
+If the number of clients is large (e.g. 20), exponential split will be too aggressive, and linear/square should be used.
+
+In this example, we generate data splits with three client sizes: 2, 5 and 20, under three split conditions: uniform, square, and exponential.
+
+### Vertical Data Split
+For vertical, we simulate a realistic 2-client scenario where participants share overlapping data samples (rows) with different features (columns).
+We split the HIGGS dataset both horizontally and vertically. As a result, each site has an overlapping subset of the rows and a  subset of the 29 columns. Since the first column of HIGGS is the class label, we give site-1 the label column for simplicity's sake.
+<img src="./figs/vertical_fl.png" alt="vertical fl diagram" width="500"/> 
+
+PSI will be performed first to identify and match the overlapping samples, then the training will be done on the intersected data.
+
+
+## Experiments
+We first run the centralized trainings to get the baseline performance, then run the federated XGBoost training using NVFlare Simulator via [JobAPI](https://nvflare.readthedocs.io/en/main/programming_guide/fed_job_api.html).
+
+### Centralized Baselines
+For centralize training, we train the XGBoost model on the whole dataset, as well as subsets with different subsample rates
+and parallel tree settings.
+```
+bash run_experiment_centralized.sh ${DATASET_ROOT}
+```
+The results by default will be stored in the folder `/tmp/nvflare/workspace/centralized/`.
+
+![Centralized validation curve](./figs/Centralized.png)
+
+As shown, including multiple trees in a single round may not yield significant performance gain,
+and can even make the accuracy worse if subsample rate is too low (e.g. 0.05).
+
+### Horizontal Experiments
+The following cases will be covered:
+- Histogram-based collaboration based on uniform data split for 2 / 5 clients
+- Tree-based collaboration with cyclic training based on uniform / exponential / square data split for 5 / 20 clients
+- Tree-based collaboration with bagging training based on uniform / exponential / square data split for 5 / 20 clients w/ and w/o scaled learning rate
+
+Histogram-based experiments can be run with:
+```
+bash run_experiment_horizontal_histogram.sh
+```
+
+> **_NOTE:_** "histogram_v2" implements a fault-tolerant XGBoost training by using 
+> NVFlare as the communicator rather than relying on XGBoost MPI, for more information, please refer to this [TechBlog](https://developer.nvidia.com/blog/federated-xgboost-made-practical-and-productive-with-nvidia-flare/).
+
+Model accuracy curve during training can be visualized in tensorboard, 
+recorded in the simulator folder under `/tmp/nvflare/workspace/works/`.
+As expected, we can observe that all histogram-based experiments results in identical curves as centeralized training:
+![Horizontal Histogram XGBoost Graph](./figs/histogram.png)
+
+Tree-based experiments can be run with:
+```
+bash run_experiment_horizontal_tree.sh
+```
+The resulting validation AUC curves are shown below:
+
+![5 clients validation curve](./figs/5_client.png)
+![20 clients validation curve](./figs/20_client.png)
+
+As illustrated, we can have the following observations:
+- cyclic training performs ok under uniform split (the purple curve), however under non-uniform split, it will have significant performance drop (the brown curve)
+- bagging training performs better than cyclic under both uniform and non-uniform data splits (orange v.s. purple, red/green v.s. brown)
+- with uniform shrinkage, bagging will have significant performance drop under non-uniform split (green v.s. orange)
+- data-size dependent shrinkage will be able to recover the performance drop above (red v.s. green), and achieve comparable/better performance as uniform data split (red v.s. orange) 
+- bagging under uniform data split (orange), and bagging with data-size dependent shrinkage under non-uniform data split(red), can achieve comparable/better performance as compared with centralized training baseline (blue)
+
+For model size, centralized training and cyclic training will have a model consisting of `num_round` trees,
+while the bagging models consist of `num_round * num_client` trees, since each round,
+bagging training boosts a forest consisting of individually trained trees from each client.
+
+### Vertical Experiments
+
+Create the psi job using the predefined psi_csv template:
+```
+nvflare job create -j ./jobs/vertical_xgb_psi -w psi_csv -sd ./code/psi -force
+```
+
+Run the psi job to calculate the dataset intersection of the clients at `psi/intersection.txt` inside the psi workspace:
+```
+nvflare simulator ./jobs/vertical_xgb_psi -w /tmp/nvflare/vertical_xgb_psi -n 2 -t 2
+```
+
+Create the vertical xgboost job using the predefined vertical_xgb template:
+```
+nvflare job create -j ./jobs/vertical_xgb -w vertical_xgb -sd ./code/vertical_xgb -force
+```
+
+Run the vertical xgboost job:
+```
+nvflare simulator ./jobs/vertical_xgb -w /tmp/nvflare/vertical_xgb -n 2 -t 2
+```
+
+Model accuracy can be visualized in tensorboard:
+```
+tensorboard --logdir /tmp/nvflare/vertical_xgb/server/simulate_job/tb_events
+```
+
+An example validation AUC graph (red) from running vertical XGBoost on HIGGS as compared with baseline centralized (blue):
+Since in this case we only used ~50k samples, the performance is worse than centralized training using full dataset.
+
+![Vertical XGBoost graph](./figs/vertical_xgb.png)
+
+## GPU Support
+By default, CPU based training is used.
+
+In order to enable GPU accelerated training, first ensure that your machine has CUDA installed and has at least one GPU.
+In `XGBFedController` set `"use_gpus": true`.
+Then, in `FedXGBHistogramExecutor` we can use the `device` parameter to map each rank to a GPU device ordinal in `xgb_params`.
+If using multiple GPUs, we can map each rank to a different GPU device, however you can also map each rank to the same GPU device if using a single GPU.
+
+
+## Reference
+[1] Zhao, L. et al., "InPrivate Digging: Enabling Tree-based Distributed Data Mining with Differential Privacy," IEEE INFOCOM 2018 - IEEE Conference on Computer Communications, 2018, pp. 2087-2095
+
+[2] Yamamoto, F. et al., "New Approaches to Federated XGBoost Learning for Privacy-Preserving Data Analysis," ICONIP 2020 - International Conference on Neural Information Processing, 2020, Lecture Notes in Computer Science, vol 12533 
diff --git a/examples/advanced/xgboost/tree-based/figs/20_client.png b/examples/advanced/xgboost/fedxgb/figs/20_client.png
similarity index 100%
rename from examples/advanced/xgboost/tree-based/figs/20_client.png
rename to examples/advanced/xgboost/fedxgb/figs/20_client.png
diff --git a/examples/advanced/xgboost/tree-based/figs/5_client.png b/examples/advanced/xgboost/fedxgb/figs/5_client.png
similarity index 100%
rename from examples/advanced/xgboost/tree-based/figs/5_client.png
rename to examples/advanced/xgboost/fedxgb/figs/5_client.png
diff --git a/examples/advanced/xgboost/tree-based/figs/Centralized.png b/examples/advanced/xgboost/fedxgb/figs/Centralized.png
similarity index 100%
rename from examples/advanced/xgboost/tree-based/figs/Centralized.png
rename to examples/advanced/xgboost/fedxgb/figs/Centralized.png
diff --git a/examples/advanced/xgboost/fedxgb/figs/histogram.png b/examples/advanced/xgboost/fedxgb/figs/histogram.png
new file mode 100644
index 0000000000..fb95949a7e
Binary files /dev/null and b/examples/advanced/xgboost/fedxgb/figs/histogram.png differ
diff --git a/examples/advanced/vertical_xgboost/figs/vertical_fl.png b/examples/advanced/xgboost/fedxgb/figs/vertical_fl.png
similarity index 100%
rename from examples/advanced/vertical_xgboost/figs/vertical_fl.png
rename to examples/advanced/xgboost/fedxgb/figs/vertical_fl.png
diff --git a/examples/advanced/xgboost/fedxgb/figs/vertical_xgb.png b/examples/advanced/xgboost/fedxgb/figs/vertical_xgb.png
new file mode 100644
index 0000000000..b5d9e0a2d1
Binary files /dev/null and b/examples/advanced/xgboost/fedxgb/figs/vertical_xgb.png differ
diff --git a/examples/advanced/xgboost/data_job_setup.ipynb b/examples/advanced/xgboost/fedxgb/notebooks/data_job_setup.ipynb
similarity index 100%
rename from examples/advanced/xgboost/data_job_setup.ipynb
rename to examples/advanced/xgboost/fedxgb/notebooks/data_job_setup.ipynb
diff --git a/examples/advanced/xgboost/histogram-based/xgboost_histogram_higgs.ipynb b/examples/advanced/xgboost/fedxgb/notebooks/xgboost_histogram_higgs.ipynb
similarity index 100%
rename from examples/advanced/xgboost/histogram-based/xgboost_histogram_higgs.ipynb
rename to examples/advanced/xgboost/fedxgb/notebooks/xgboost_histogram_higgs.ipynb
diff --git a/examples/advanced/xgboost/tree-based/xgboost_tree_higgs.ipynb b/examples/advanced/xgboost/fedxgb/notebooks/xgboost_tree_higgs.ipynb
similarity index 100%
rename from examples/advanced/xgboost/tree-based/xgboost_tree_higgs.ipynb
rename to examples/advanced/xgboost/fedxgb/notebooks/xgboost_tree_higgs.ipynb
diff --git a/examples/advanced/xgboost/fedxgb/prepare_data.sh b/examples/advanced/xgboost/fedxgb/prepare_data.sh
new file mode 100755
index 0000000000..5d687bb5a7
--- /dev/null
+++ b/examples/advanced/xgboost/fedxgb/prepare_data.sh
@@ -0,0 +1,37 @@
+#!/usr/bin/env bash
+DATASET_PATH="${1}/HIGGS.csv"
+if [ ! -f "${DATASET_PATH}" ]
+then
+    echo "Please check if you saved HIGGS dataset in ${DATASET_PATH}"
+    exit 1
+fi
+
+echo "Generating HIGGS data splits, reading from ${DATASET_PATH}"
+
+OUTPUT_PATH="/tmp/nvflare/dataset/xgboost_higgs_horizontal"
+for site_num in 2 5 20;
+do
+    for split_mode in uniform exponential square;
+    do
+        python3 utils/prepare_data_horizontal.py \
+        --data_path "${DATASET_PATH}" \
+        --site_num ${site_num} \
+        --size_total 11000000 \
+        --size_valid 1000000 \
+        --split_method ${split_mode} \
+        --out_path "${OUTPUT_PATH}/${site_num}_${split_mode}"
+    done
+done
+echo "Horizontal data splits are generated in ${OUTPUT_PATH}"
+
+OUTPUT_PATH="/tmp/nvflare/dataset/xgboost_higgs_vertical"
+OUTPUT_FILE="higgs.data.csv"
+# Note: HIGGS has 11 million preshuffled instances; using rows_total_percentage to reduce PSI time for example
+python3 utils/prepare_data_vertical.py \
+--data_path "${DATASET_PATH}" \
+--site_num 2 \
+--rows_total_percentage 0.02 \
+--rows_overlap_percentage 0.25 \
+--out_path "${OUTPUT_PATH}" \
+--out_file "${OUTPUT_FILE}"
+echo "Vertical data splits are generated in ${OUTPUT_PATH}"
diff --git a/examples/advanced/xgboost/fedxgb/run_experiment_centralized.sh b/examples/advanced/xgboost/fedxgb/run_experiment_centralized.sh
new file mode 100755
index 0000000000..4a34d6e8c3
--- /dev/null
+++ b/examples/advanced/xgboost/fedxgb/run_experiment_centralized.sh
@@ -0,0 +1,15 @@
+#!/usr/bin/env bash
+DATASET_PATH="${1}/HIGGS.csv"
+if [ ! -f "${DATASET_PATH}" ]
+then
+    echo "Please check if you saved HIGGS dataset in ${DATASET_PATH}"
+    exit 1
+fi
+
+python3 utils/baseline_centralized.py --num_parallel_tree 1 --data_path "${DATASET_PATH}"
+python3 utils/baseline_centralized.py --num_parallel_tree 1 --data_path "${DATASET_PATH}" --train_in_one_session
+python3 utils/baseline_centralized.py --num_parallel_tree 5 --subsample 0.8 --data_path "${DATASET_PATH}"
+python3 utils/baseline_centralized.py --num_parallel_tree 5 --subsample 0.2 --data_path "${DATASET_PATH}"
+python3 utils/baseline_centralized.py --num_parallel_tree 20 --subsample 0.8 --data_path "${DATASET_PATH}"
+python3 utils/baseline_centralized.py --num_parallel_tree 20 --subsample 0.05 --data_path "${DATASET_PATH}"
+
diff --git a/examples/advanced/xgboost/fedxgb/run_experiment_horizontal_histogram.sh b/examples/advanced/xgboost/fedxgb/run_experiment_horizontal_histogram.sh
new file mode 100755
index 0000000000..c04735e15f
--- /dev/null
+++ b/examples/advanced/xgboost/fedxgb/run_experiment_horizontal_histogram.sh
@@ -0,0 +1,5 @@
+#!/usr/bin/env bash
+python3 xgb_fl_job_horizontal.py --site_num 2 --training_algo histogram --split_method uniform --lr_mode uniform --data_split_mode horizontal
+python3 xgb_fl_job_horizontal.py --site_num 5 --training_algo histogram --split_method uniform --lr_mode uniform --data_split_mode horizontal
+python3 xgb_fl_job_horizontal.py --site_num 2 --training_algo histogram_v2 --split_method uniform --lr_mode uniform --data_split_mode horizontal
+python3 xgb_fl_job_horizontal.py --site_num 5 --training_algo histogram_v2 --split_method uniform --lr_mode uniform --data_split_mode horizontal
diff --git a/examples/advanced/xgboost/fedxgb/run_experiment_horizontal_tree.sh b/examples/advanced/xgboost/fedxgb/run_experiment_horizontal_tree.sh
new file mode 100755
index 0000000000..3ab5573e63
--- /dev/null
+++ b/examples/advanced/xgboost/fedxgb/run_experiment_horizontal_tree.sh
@@ -0,0 +1,12 @@
+#!/usr/bin/env bash
+python3 xgb_fl_job_horizontal.py --site_num 5 --training_algo bagging --split_method exponential --lr_mode uniform --data_split_mode horizontal
+python3 xgb_fl_job_horizontal.py --site_num 5 --training_algo bagging --split_method exponential --lr_mode scaled --data_split_mode horizontal
+python3 xgb_fl_job_horizontal.py --site_num 5 --training_algo bagging --split_method uniform --lr_mode uniform --data_split_mode horizontal
+python3 xgb_fl_job_horizontal.py --site_num 5 --training_algo cyclic --split_method exponential --lr_mode uniform --data_split_mode horizontal
+python3 xgb_fl_job_horizontal.py --site_num 5 --training_algo cyclic --split_method uniform --lr_mode uniform --data_split_mode horizontal
+
+python3 xgb_fl_job_horizontal.py --site_num 20 --training_algo bagging --split_method square --lr_mode uniform --data_split_mode horizontal
+python3 xgb_fl_job_horizontal.py --site_num 20 --training_algo bagging --split_method square --lr_mode scaled --data_split_mode horizontal
+python3 xgb_fl_job_horizontal.py --site_num 20 --training_algo bagging --split_method uniform --lr_mode uniform --data_split_mode horizontal
+python3 xgb_fl_job_horizontal.py --site_num 20 --training_algo cyclic --split_method square --lr_mode uniform --data_split_mode horizontal
+python3 xgb_fl_job_horizontal.py --site_num 20 --training_algo cyclic --split_method uniform --lr_mode uniform --data_split_mode horizontal
\ No newline at end of file
diff --git a/examples/advanced/xgboost/fedxgb/run_experiment_vertical.sh b/examples/advanced/xgboost/fedxgb/run_experiment_vertical.sh
new file mode 100755
index 0000000000..35abee98fd
--- /dev/null
+++ b/examples/advanced/xgboost/fedxgb/run_experiment_vertical.sh
@@ -0,0 +1,3 @@
+#!/usr/bin/env bash
+python3 xgb_fl_job_vertical_psi.py
+python3 xgb_fl_job_vertical.py
\ No newline at end of file
diff --git a/examples/advanced/xgboost/histogram-based/jobs/base/app/custom/higgs_data_loader.py b/examples/advanced/xgboost/fedxgb/src/higgs_data_loader.py
similarity index 100%
rename from examples/advanced/xgboost/histogram-based/jobs/base/app/custom/higgs_data_loader.py
rename to examples/advanced/xgboost/fedxgb/src/higgs_data_loader.py
diff --git a/examples/advanced/vertical_xgboost/code/psi/local_psi.py b/examples/advanced/xgboost/fedxgb/src/local_psi.py
similarity index 100%
rename from examples/advanced/vertical_xgboost/code/psi/local_psi.py
rename to examples/advanced/xgboost/fedxgb/src/local_psi.py
diff --git a/examples/advanced/vertical_xgboost/code/vertical_xgb/vertical_data_loader.py b/examples/advanced/xgboost/fedxgb/src/vertical_data_loader.py
similarity index 100%
rename from examples/advanced/vertical_xgboost/code/vertical_xgb/vertical_data_loader.py
rename to examples/advanced/xgboost/fedxgb/src/vertical_data_loader.py
diff --git a/examples/advanced/xgboost/utils/baseline_centralized.py b/examples/advanced/xgboost/fedxgb/utils/baseline_centralized.py
similarity index 94%
rename from examples/advanced/xgboost/utils/baseline_centralized.py
rename to examples/advanced/xgboost/fedxgb/utils/baseline_centralized.py
index a462a539a7..dcac3e5f01 100644
--- a/examples/advanced/xgboost/utils/baseline_centralized.py
+++ b/examples/advanced/xgboost/fedxgb/utils/baseline_centralized.py
@@ -25,11 +25,13 @@
 
 def xgboost_args_parser():
     parser = argparse.ArgumentParser(description="Centralized XGBoost training with random forest options")
-    parser.add_argument("--data_path", type=str, default="./dataset/HIGGS_UCI.csv", help="path to dataset file")
+    parser.add_argument("--data_path", type=str, help="path to dataset file")
     parser.add_argument("--num_parallel_tree", type=int, default=1, help="num_parallel_tree for random forest setting")
     parser.add_argument("--subsample", type=float, default=1, help="subsample for random forest setting")
     parser.add_argument("--num_rounds", type=int, default=100, help="number of boosting rounds")
-    parser.add_argument("--workspace_root", type=str, default="workspaces", help="workspaces root")
+    parser.add_argument(
+        "--workspace_root", type=str, default="/tmp/nvflare/workspace/centralized", help="workspaces root"
+    )
     parser.add_argument("--tree_method", type=str, default="hist", help="tree_method")
     parser.add_argument("--train_in_one_session", action="store_true", help="whether to train in one session")
     return parser
@@ -63,7 +65,7 @@ def train_one_by_one(train_data, val_data, xgb_params, num_rounds, val_label, wr
             y_pred = bst_last.predict(val_data)
             roc = roc_auc_score(val_label, y_pred)
             print(f"Round: {bst_last.num_boosted_rounds()} model testing AUC {roc}")
-            writer.add_scalar("AUC", roc, r - 1)
+            writer.add_scalar("eval_metrics", roc, r - 1)
             # Train new model
             print(f"Round: {r} Base ", end="")
             bst = xgb.train(
@@ -152,7 +154,7 @@ def main():
     y_pred = bst.predict(dmat_valid)
     roc = roc_auc_score(y_higgs[0:valid_num], y_pred)
     print(f"Base model: {roc}")
-    writer.add_scalar("AUC", roc, num_rounds - 1)
+    writer.add_scalar("eval_metrics", roc, num_rounds - 1)
     writer.close()
 
 
diff --git a/examples/advanced/xgboost/utils/prepare_data_split.py b/examples/advanced/xgboost/fedxgb/utils/prepare_data_horizontal.py
similarity index 100%
rename from examples/advanced/xgboost/utils/prepare_data_split.py
rename to examples/advanced/xgboost/fedxgb/utils/prepare_data_horizontal.py
diff --git a/examples/advanced/vertical_xgboost/utils/prepare_data.py b/examples/advanced/xgboost/fedxgb/utils/prepare_data_vertical.py
similarity index 100%
rename from examples/advanced/vertical_xgboost/utils/prepare_data.py
rename to examples/advanced/xgboost/fedxgb/utils/prepare_data_vertical.py
diff --git a/examples/advanced/xgboost/fedxgb/xgb_fl_job_horizontal.py b/examples/advanced/xgboost/fedxgb/xgb_fl_job_horizontal.py
new file mode 100644
index 0000000000..718f7fd624
--- /dev/null
+++ b/examples/advanced/xgboost/fedxgb/xgb_fl_job_horizontal.py
@@ -0,0 +1,260 @@
+# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import argparse
+import json
+import os
+
+from src.higgs_data_loader import HIGGSDataLoader
+
+from nvflare.app_common.widgets.convert_to_fed_event import ConvertToFedEvent
+from nvflare.app_opt.tracking.tb.tb_receiver import TBAnalyticsReceiver
+from nvflare.app_opt.tracking.tb.tb_writer import TBWriter
+from nvflare.job_config.api import FedJob
+
+ALGO_DIR_MAP = {
+    "bagging": "tree-based",
+    "cyclic": "tree-based",
+    "histogram": "histogram-based",
+    "histogram_v2": "histogram-based",
+}
+
+
+def define_parser():
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--data_root",
+        type=str,
+        default="/tmp/nvflare/dataset/xgboost_higgs",
+        help="Path to dataset files for each site",
+    )
+    parser.add_argument("--site_num", type=int, default=2, help="Total number of sites")
+    parser.add_argument("--round_num", type=int, default=100, help="Total number of training rounds")
+    parser.add_argument(
+        "--training_algo", type=str, default="histogram", choices=list(ALGO_DIR_MAP.keys()), help="Training algorithm"
+    )
+    parser.add_argument("--split_method", type=str, default="uniform", help="How to split the dataset")
+    parser.add_argument("--lr_mode", type=str, default="uniform", help="Whether to use uniform or scaled shrinkage")
+    parser.add_argument("--nthread", type=int, default=16, help="nthread for xgboost")
+    parser.add_argument(
+        "--tree_method", type=str, default="hist", help="tree_method for xgboost - use hist for best perf"
+    )
+    parser.add_argument(
+        "--data_split_mode",
+        type=str,
+        default="horizontal",
+        choices=["horizontal", "vertical"],
+        help="dataset split mode, horizontal or vertical",
+    )
+    return parser.parse_args()
+
+
+def _get_job_name(args) -> str:
+    return f"higgs_{args.site_num}_{args.training_algo}_{args.split_method}_split_{args.lr_mode}_lr"
+
+
+def _get_data_path(args) -> str:
+    return f"{args.data_root}_{args.data_split_mode}/{args.site_num}_{args.split_method}"
+
+
+def _read_json(filename):
+    if not os.path.isfile(filename):
+        raise ValueError(f"{filename} does not exist!")
+    with open(filename, "r") as f:
+        return json.load(f)
+
+
+def _get_lr_scale_from_split_json(data_split: dict):
+    split = {}
+    total_data_num = 0
+    for k, v in data_split["data_index"].items():
+        if k == "valid":
+            continue
+        data_num = int(v["end"] - v["start"])
+        total_data_num += data_num
+        split[k] = data_num
+
+    lr_scales = {}
+    for k in split:
+        lr_scales[k] = split[k] / total_data_num
+
+    return lr_scales
+
+
+def main():
+    args = define_parser()
+    job_name = _get_job_name(args)
+    dataset_path = _get_data_path(args)
+
+    site_num = args.site_num
+    job = FedJob(name=job_name, min_clients=site_num)
+
+    # Define the controller workflow and send to server
+    if args.training_algo == "histogram":
+        from nvflare.app_opt.xgboost.histogram_based.controller import XGBFedController
+
+        controller = XGBFedController()
+        from nvflare.app_opt.xgboost.histogram_based.executor import FedXGBHistogramExecutor
+
+        executor = FedXGBHistogramExecutor(
+            data_loader_id="dataloader",
+            num_rounds=args.round_num,
+            early_stopping_rounds=2,
+            metrics_writer_id="metrics_writer",
+            xgb_params={
+                "max_depth": 8,
+                "eta": 0.1,
+                "objective": "binary:logistic",
+                "eval_metric": "auc",
+                "tree_method": "hist",
+                "nthread": 16,
+            },
+        )
+        # Add tensorboard receiver to server
+        tb_receiver = TBAnalyticsReceiver(
+            tb_folder="tb_events",
+        )
+        job.to_server(tb_receiver, id="tb_receiver")
+    elif args.training_algo == "histogram_v2":
+        from nvflare.app_opt.xgboost.histogram_based_v2.fed_controller import XGBFedController
+
+        controller = XGBFedController(
+            num_rounds=args.round_num,
+            data_split_mode=0,
+            secure_training=False,
+            xgb_options={"early_stopping_rounds": 2, "use_gpus": False},
+            xgb_params={
+                "max_depth": 8,
+                "eta": 0.1,
+                "objective": "binary:logistic",
+                "eval_metric": "auc",
+                "tree_method": "hist",
+                "nthread": 16,
+            },
+        )
+        from nvflare.app_opt.xgboost.histogram_based_v2.fed_executor import FedXGBHistogramExecutor
+
+        executor = FedXGBHistogramExecutor(
+            data_loader_id="dataloader",
+            metrics_writer_id="metrics_writer",
+        )
+        # Add tensorboard receiver to server
+        tb_receiver = TBAnalyticsReceiver(
+            tb_folder="tb_events",
+        )
+        job.to_server(tb_receiver, id="tb_receiver")
+    elif args.training_algo == "bagging":
+        from nvflare.app_common.workflows.scatter_and_gather import ScatterAndGather
+
+        controller = ScatterAndGather(
+            min_clients=args.site_num,
+            num_rounds=args.round_num,
+            start_round=0,
+            aggregator_id="aggregator",
+            persistor_id="persistor",
+            shareable_generator_id="shareable_generator",
+            wait_time_after_min_received=0,
+            train_timeout=0,
+            allow_empty_global_weights=True,
+            task_check_period=0.01,
+            persist_every_n_rounds=0,
+            snapshot_every_n_rounds=0,
+        )
+        from nvflare.app_opt.xgboost.tree_based.model_persistor import XGBModelPersistor
+
+        persistor = XGBModelPersistor(save_name="xgboost_model.json")
+        from nvflare.app_opt.xgboost.tree_based.shareable_generator import XGBModelShareableGenerator
+
+        shareable_generator = XGBModelShareableGenerator()
+        from nvflare.app_opt.xgboost.tree_based.bagging_aggregator import XGBBaggingAggregator
+
+        aggregator = XGBBaggingAggregator()
+        job.to_server(persistor, id="persistor")
+        job.to_server(shareable_generator, id="shareable_generator")
+        job.to_server(aggregator, id="aggregator")
+    elif args.training_algo == "cyclic":
+        from nvflare.app_common.workflows.cyclic_ctl import CyclicController
+
+        controller = CyclicController(
+            num_rounds=int(args.round_num / args.site_num),
+            task_assignment_timeout=60,
+            persistor_id="persistor",
+            shareable_generator_id="shareable_generator",
+            task_check_period=0.01,
+            persist_every_n_rounds=0,
+            snapshot_every_n_rounds=0,
+        )
+        from nvflare.app_opt.xgboost.tree_based.model_persistor import XGBModelPersistor
+
+        persistor = XGBModelPersistor(save_name="xgboost_model.json", load_as_dict=False)
+        from nvflare.app_opt.xgboost.tree_based.shareable_generator import XGBModelShareableGenerator
+
+        shareable_generator = XGBModelShareableGenerator()
+        job.to_server(persistor, id="persistor")
+        job.to_server(shareable_generator, id="shareable_generator")
+    # send controller to server
+    job.to_server(controller, id="xgb_controller")
+
+    # Add executor and other components to clients
+    for site_id in range(1, site_num + 1):
+        if args.training_algo in ["bagging", "cyclic"]:
+            lr_scale = 1
+            num_client_bagging = 1
+            if args.training_algo == "bagging":
+                num_client_bagging = args.site_num
+            if args.lr_mode == "scaled":
+                data_split = _read_json(f"{dataset_path}/data_site-{site_id}.json")
+                lr_scales = _get_lr_scale_from_split_json(data_split)
+                lr_scale = lr_scales[f"site-{site_id}"]
+            from nvflare.app_opt.xgboost.tree_based.executor import FedXGBTreeExecutor
+
+            executor = FedXGBTreeExecutor(
+                data_loader_id="dataloader",
+                training_mode=args.training_algo,
+                num_client_bagging=num_client_bagging,
+                num_local_parallel_tree=1,
+                local_subsample=1,
+                local_model_path="model.json",
+                global_model_path="model_global.json",
+                learning_rate=0.1,
+                objective="binary:logistic",
+                max_depth=8,
+                eval_metric="auc",
+                tree_method="hist",
+                nthread=16,
+                lr_scale=lr_scale,
+                lr_mode=args.lr_mode,
+            )
+        job.to(executor, f"site-{site_id}")
+
+        dataloader = HIGGSDataLoader(data_split_filename=f"{dataset_path}/data_site-{site_id}.json")
+        job.to(dataloader, f"site-{site_id}", id="dataloader")
+
+        if args.training_algo in ["histogram", "histogram_v2"]:
+            metrics_writer = TBWriter(event_type="analytix_log_stats")
+            job.to(metrics_writer, f"site-{site_id}", id="metrics_writer")
+
+            event_to_fed = ConvertToFedEvent(
+                events_to_convert=["analytix_log_stats"],
+                fed_event_prefix="fed.",
+            )
+            job.to(event_to_fed, f"site-{site_id}", id="event_to_fed")
+
+    # Export job config and run the job
+    job.export_job("/tmp/nvflare/workspace/jobs/")
+    job.simulator_run(f"/tmp/nvflare/workspace/works/{job_name}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/examples/advanced/xgboost/fedxgb/xgb_fl_job_vertical.py b/examples/advanced/xgboost/fedxgb/xgb_fl_job_vertical.py
new file mode 100644
index 0000000000..717188f681
--- /dev/null
+++ b/examples/advanced/xgboost/fedxgb/xgb_fl_job_vertical.py
@@ -0,0 +1,107 @@
+# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import argparse
+
+from src.vertical_data_loader import VerticalDataLoader
+
+from nvflare.app_common.widgets.convert_to_fed_event import ConvertToFedEvent
+from nvflare.app_opt.tracking.tb.tb_receiver import TBAnalyticsReceiver
+from nvflare.app_opt.tracking.tb.tb_writer import TBWriter
+from nvflare.app_opt.xgboost.histogram_based_v2.fed_controller import XGBFedController
+from nvflare.app_opt.xgboost.histogram_based_v2.fed_executor import FedXGBHistogramExecutor
+from nvflare.job_config.api import FedJob
+
+
+def define_parser():
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--data_split_path",
+        type=str,
+        default="/tmp/nvflare/dataset/xgboost_higgs_vertical/{SITE_NAME}/higgs.data.csv",
+        help="Path to data split files for each site",
+    )
+    parser.add_argument(
+        "--psi_path",
+        type=str,
+        default="/tmp/nvflare/workspace/works/vertical_xgb_psi/{SITE_NAME}/simulate_job/{SITE_NAME}/psi/intersection.txt",
+        help="Path to psi files for each site",
+    )
+    parser.add_argument("--site_num", type=int, default=2, help="Total number of sites")
+    parser.add_argument("--round_num", type=int, default=100, help="Total number of training rounds")
+    return parser.parse_args()
+
+
+def main():
+    args = define_parser()
+    data_split_path = args.data_split_path
+    psi_path = args.psi_path
+    site_num = args.site_num
+    round_num = args.round_num
+    job_name = "xgboost_vertical"
+    job = FedJob(name=job_name, min_clients=site_num)
+
+    # Define the controller workflow and send to server
+    controller = XGBFedController(
+        num_rounds=round_num,
+        data_split_mode=1,
+        secure_training=False,
+        xgb_options={"early_stopping_rounds": 3, "use_gpus": False},
+        xgb_params={
+            "max_depth": 8,
+            "eta": 0.1,
+            "objective": "binary:logistic",
+            "eval_metric": "auc",
+            "tree_method": "hist",
+            "nthread": 16,
+        },
+    )
+    job.to_server(controller, id="xgb_controller")
+
+    # Add tensorboard receiver to server
+    tb_receiver = TBAnalyticsReceiver(
+        tb_folder="tb_events",
+    )
+    job.to_server(tb_receiver, id="tb_receiver")
+
+    # Define the executor and send to clients
+    executor = FedXGBHistogramExecutor(
+        data_loader_id="dataloader",
+        metrics_writer_id="metrics_writer",
+        in_process=True,
+        model_file_name="test.model.json",
+    )
+    job.to_clients(executor, id="xgb_hist_executor", tasks=["config", "start"])
+
+    dataloader = VerticalDataLoader(
+        data_split_path=data_split_path, psi_path=psi_path, id_col="uid", label_owner="site-1", train_proportion=0.8
+    )
+    job.to_clients(dataloader, id="dataloader")
+
+    metrics_writer = TBWriter(event_type="analytix_log_stats")
+    job.to_clients(metrics_writer, id="metrics_writer")
+
+    event_to_fed = ConvertToFedEvent(
+        events_to_convert=["analytix_log_stats"],
+        fed_event_prefix="fed.",
+    )
+    job.to_clients(event_to_fed, id="event_to_fed")
+
+    # Export job config and run the job
+    job.export_job("/tmp/nvflare/workspace/jobs/")
+    job.simulator_run(f"/tmp/nvflare/workspace/works/{job_name}", n_clients=site_num)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/examples/advanced/xgboost/fedxgb/xgb_fl_job_vertical_psi.py b/examples/advanced/xgboost/fedxgb/xgb_fl_job_vertical_psi.py
new file mode 100644
index 0000000000..fc4954e772
--- /dev/null
+++ b/examples/advanced/xgboost/fedxgb/xgb_fl_job_vertical_psi.py
@@ -0,0 +1,70 @@
+# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import argparse
+
+from src.local_psi import LocalPSI
+
+from nvflare.app_common.psi.dh_psi.dh_psi_controller import DhPSIController
+from nvflare.app_common.psi.file_psi_writer import FilePSIWriter
+from nvflare.app_common.psi.psi_executor import PSIExecutor
+from nvflare.app_opt.psi.dh_psi.dh_psi_task_handler import DhPSITaskHandler
+from nvflare.job_config.api import FedJob
+
+
+def define_parser():
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--data_split_path",
+        type=str,
+        default="/tmp/nvflare/dataset/xgboost_higgs_vertical/site-x/higgs.data.csv",
+        help="Path to data split files for each site",
+    )
+    parser.add_argument("--site_num", type=int, default=2, help="Total number of sites")
+    parser.add_argument("--psi_path", type=str, default="psi/intersection.txt", help="PSI ouput path")
+    return parser.parse_args()
+
+
+def main():
+    args = define_parser()
+    data_split_path = args.data_split_path
+    psi_path = args.psi_path
+    site_num = args.site_num
+    job_name = "xgboost_vertical_psi"
+    job = FedJob(name=job_name, min_clients=site_num)
+
+    # Define the controller workflow and send to server
+    controller = DhPSIController()
+    job.to_server(controller)
+
+    # Define the executor and other components for each site
+    executor = PSIExecutor(psi_algo_id="dh_psi")
+    job.to_clients(executor, id="psi_executor", tasks=["PSI"])
+
+    local_psi = LocalPSI(psi_writer_id="psi_writer", data_split_path=data_split_path, id_col="uid")
+    job.to_clients(local_psi, id="local_psi")
+
+    task_handler = DhPSITaskHandler(local_psi_id="local_psi")
+    job.to_clients(task_handler, id="dh_psi")
+
+    psi_writer = FilePSIWriter(output_path=psi_path)
+    job.to_clients(psi_writer, id="psi_writer")
+
+    # Export job config and run the job
+    job.export_job("/tmp/nvflare/workspace/jobs/")
+    job.simulator_run(f"/tmp/nvflare/workspace/works/{job_name}", n_clients=site_num)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/examples/advanced/xgboost_secure/.gitignore b/examples/advanced/xgboost/fedxgb_secure/.gitignore
similarity index 100%
rename from examples/advanced/xgboost_secure/.gitignore
rename to examples/advanced/xgboost/fedxgb_secure/.gitignore
diff --git a/examples/advanced/xgboost_secure/README.md b/examples/advanced/xgboost/fedxgb_secure/README.md
similarity index 100%
rename from examples/advanced/xgboost_secure/README.md
rename to examples/advanced/xgboost/fedxgb_secure/README.md
diff --git a/examples/advanced/xgboost_secure/figs/tree.base.png b/examples/advanced/xgboost/fedxgb_secure/figs/tree.base.png
similarity index 100%
rename from examples/advanced/xgboost_secure/figs/tree.base.png
rename to examples/advanced/xgboost/fedxgb_secure/figs/tree.base.png
diff --git a/examples/advanced/xgboost_secure/figs/tree.vert.secure.0.png b/examples/advanced/xgboost/fedxgb_secure/figs/tree.vert.secure.0.png
similarity index 100%
rename from examples/advanced/xgboost_secure/figs/tree.vert.secure.0.png
rename to examples/advanced/xgboost/fedxgb_secure/figs/tree.vert.secure.0.png
diff --git a/examples/advanced/xgboost_secure/figs/tree.vert.secure.1.png b/examples/advanced/xgboost/fedxgb_secure/figs/tree.vert.secure.1.png
similarity index 100%
rename from examples/advanced/xgboost_secure/figs/tree.vert.secure.1.png
rename to examples/advanced/xgboost/fedxgb_secure/figs/tree.vert.secure.1.png
diff --git a/examples/advanced/xgboost_secure/figs/tree.vert.secure.2.png b/examples/advanced/xgboost/fedxgb_secure/figs/tree.vert.secure.2.png
similarity index 100%
rename from examples/advanced/xgboost_secure/figs/tree.vert.secure.2.png
rename to examples/advanced/xgboost/fedxgb_secure/figs/tree.vert.secure.2.png
diff --git a/examples/advanced/xgboost_secure/prepare_data.sh b/examples/advanced/xgboost/fedxgb_secure/prepare_data.sh
similarity index 100%
rename from examples/advanced/xgboost_secure/prepare_data.sh
rename to examples/advanced/xgboost/fedxgb_secure/prepare_data.sh
diff --git a/examples/advanced/xgboost_secure/prepare_flare_job.sh b/examples/advanced/xgboost/fedxgb_secure/prepare_flare_job.sh
similarity index 100%
rename from examples/advanced/xgboost_secure/prepare_flare_job.sh
rename to examples/advanced/xgboost/fedxgb_secure/prepare_flare_job.sh
diff --git a/examples/advanced/xgboost_secure/project.yml b/examples/advanced/xgboost/fedxgb_secure/project.yml
similarity index 100%
rename from examples/advanced/xgboost_secure/project.yml
rename to examples/advanced/xgboost/fedxgb_secure/project.yml
diff --git a/examples/advanced/xgboost_secure/run_training_flare.sh b/examples/advanced/xgboost/fedxgb_secure/run_training_flare.sh
similarity index 100%
rename from examples/advanced/xgboost_secure/run_training_flare.sh
rename to examples/advanced/xgboost/fedxgb_secure/run_training_flare.sh
diff --git a/examples/advanced/xgboost_secure/run_training_standalone.sh b/examples/advanced/xgboost/fedxgb_secure/run_training_standalone.sh
similarity index 100%
rename from examples/advanced/xgboost_secure/run_training_standalone.sh
rename to examples/advanced/xgboost/fedxgb_secure/run_training_standalone.sh
diff --git a/examples/advanced/xgboost_secure/train_standalone/train_base.py b/examples/advanced/xgboost/fedxgb_secure/train_standalone/train_base.py
similarity index 98%
rename from examples/advanced/xgboost_secure/train_standalone/train_base.py
rename to examples/advanced/xgboost/fedxgb_secure/train_standalone/train_base.py
index 58db56b94c..a8762b9e29 100644
--- a/examples/advanced/xgboost_secure/train_standalone/train_base.py
+++ b/examples/advanced/xgboost/fedxgb_secure/train_standalone/train_base.py
@@ -38,7 +38,7 @@ def train_base_args_parser():
     parser.add_argument(
         "--out_path",
         type=str,
-        default="/tmp/nvflare/xgboost_secure/train_standalone/base",
+        default="/tmp/nvflare/fedxgb_secure/train_standalone/base",
         help="Output path for the data split file",
     )
     return parser
diff --git a/examples/advanced/xgboost_secure/train_standalone/train_federated.py b/examples/advanced/xgboost/fedxgb_secure/train_standalone/train_federated.py
similarity index 98%
rename from examples/advanced/xgboost_secure/train_standalone/train_federated.py
rename to examples/advanced/xgboost/fedxgb_secure/train_standalone/train_federated.py
index 808e88fa17..f4aad83054 100644
--- a/examples/advanced/xgboost_secure/train_standalone/train_federated.py
+++ b/examples/advanced/xgboost/fedxgb_secure/train_standalone/train_federated.py
@@ -48,7 +48,7 @@ def train_federated_args_parser():
     parser.add_argument(
         "--out_path",
         type=str,
-        default="/tmp/nvflare/xgboost_secure/train_standalone/federated",
+        default="/tmp/nvflare/fedxgb_secure/train_standalone/federated",
         help="Output path for the data split file",
     )
     return parser
diff --git a/examples/advanced/xgboost_secure/utils/prepare_data_base.py b/examples/advanced/xgboost/fedxgb_secure/utils/prepare_data_base.py
similarity index 100%
rename from examples/advanced/xgboost_secure/utils/prepare_data_base.py
rename to examples/advanced/xgboost/fedxgb_secure/utils/prepare_data_base.py
diff --git a/examples/advanced/xgboost_secure/utils/prepare_data_horizontal.py b/examples/advanced/xgboost/fedxgb_secure/utils/prepare_data_horizontal.py
similarity index 100%
rename from examples/advanced/xgboost_secure/utils/prepare_data_horizontal.py
rename to examples/advanced/xgboost/fedxgb_secure/utils/prepare_data_horizontal.py
diff --git a/examples/advanced/xgboost_secure/utils/prepare_data_traintest_split.py b/examples/advanced/xgboost/fedxgb_secure/utils/prepare_data_traintest_split.py
similarity index 100%
rename from examples/advanced/xgboost_secure/utils/prepare_data_traintest_split.py
rename to examples/advanced/xgboost/fedxgb_secure/utils/prepare_data_traintest_split.py
diff --git a/examples/advanced/xgboost_secure/utils/prepare_data_vertical.py b/examples/advanced/xgboost/fedxgb_secure/utils/prepare_data_vertical.py
similarity index 100%
rename from examples/advanced/xgboost_secure/utils/prepare_data_vertical.py
rename to examples/advanced/xgboost/fedxgb_secure/utils/prepare_data_vertical.py
diff --git a/examples/advanced/xgboost/histogram-based/README.md b/examples/advanced/xgboost/histogram-based/README.md
deleted file mode 100644
index 8c89f95eff..0000000000
--- a/examples/advanced/xgboost/histogram-based/README.md
+++ /dev/null
@@ -1,77 +0,0 @@
-# Histogram-based Federated Learning for XGBoost   
-
-## Run automated experiments
-Please make sure to finish the [preparation steps](../README.md) before running the following steps.
-To run this example with NVFlare, follow the steps below or this [notebook](./xgboost_histogram_higgs.ipynb) for an interactive experience.
-
-### Environment Preparation
-
-Switch to this directory and install additional requirements (suggest to do this inside virtual environment):
-```
-python3 -m pip install -r requirements.txt
-```
-
-### Run centralized experiments
-```
-bash run_experiment_centralized.sh
-```
-
-### Run federated experiments with simulator locally
-Next, we will use the NVFlare simulator to run FL training automatically.
-```
-nvflare simulator jobs/higgs_2_histogram_v2_uniform_split_uniform_lr \
-   -w /tmp/nvflare/xgboost_v2_workspace -n 2 -t 2
-```
-
-Model accuracy can be visualized in tensorboard:
-```
-tensorboard --logdir /tmp/nvflare/xgboost_v2_workspace/simulate_job/tb_events
-```
-
-### Run federated experiments in real world
-
-To run in a federated setting, follow [Real-World FL](https://nvflare.readthedocs.io/en/main/real_world_fl.html) to
-start the overseer, FL servers and FL clients.
-
-You need to download the HIGGS data on each client site.
-You will also need to install XGBoost on each client site and server site.
-
-You can still generate the data splits and job configs using the scripts provided.
-
-You will need to copy the generated data split file into each client site.
-You might also need to modify the `data_path` in the `data_site-XXX.json`
-inside the `/tmp/nvflare/xgboost_higgs_dataset` folder,
-since each site might save the HIGGS dataset in different places.
-
-Then, you can use the admin client to submit the job via the `submit_job` command.
-
-## Customization
-
-The provided XGBoost executor can be customized using boost parameters
-provided in the `xgb_params` argument.
-
-If the parameter change alone is not sufficient and code changes are required,
-a custom executor can be implemented to make calls to xgboost library directly.
-
-The custom executor can inherit the base class `FedXGBHistogramExecutor` and
-overwrite the `xgb_train()` method.
-
-To use a different dataset, you can inherit the base class `XGBDataLoader` and
-implement the `load_data()` method.
-
-## Loose integration
-
-We can use the NVFlare controller/executor just to launch the external xgboost
-federated server and client.
-
-### Run federated experiments with simulator locally
-Next, we will use the NVFlare simulator to run FL training automatically.
-```
-nvflare simulator jobs/higgs_2_histogram_uniform_split_uniform_lr \
-   -w /tmp/nvflare/xgboost_workspace -n 2 -t 2
-```
-
-Model accuracy can be visualized in tensorboard:
-```
-tensorboard --logdir /tmp/nvflare/xgboost_workspace/simulate_job/tb_events
-```
diff --git a/examples/advanced/xgboost/histogram-based/jobs/base/app/config/config_fed_client.json b/examples/advanced/xgboost/histogram-based/jobs/base/app/config/config_fed_client.json
deleted file mode 100755
index a3fe316d90..0000000000
--- a/examples/advanced/xgboost/histogram-based/jobs/base/app/config/config_fed_client.json
+++ /dev/null
@@ -1,50 +0,0 @@
-{
-  "format_version": 2,
-  "num_rounds": 100,
-  "executors": [
-    {
-      "tasks": [
-        "train"
-      ],
-      "executor": {
-        "id": "Executor",
-        "path": "nvflare.app_opt.xgboost.histogram_based.executor.FedXGBHistogramExecutor",
-        "args": {
-          "data_loader_id": "dataloader",
-          "num_rounds": "{num_rounds}",
-          "early_stopping_rounds": 2,
-          "metrics_writer_id": "metrics_writer",
-          "xgb_params": {
-            "max_depth": 8,
-            "eta": 0.1,
-            "objective": "binary:logistic",
-            "eval_metric": "auc",
-            "tree_method": "hist",
-            "nthread": 16
-          }
-        }
-      }
-    }
-  ],
-  "task_result_filters": [],
-  "task_data_filters": [],
-  "components": [
-    {
-      "id": "dataloader",
-      "path": "higgs_data_loader.HIGGSDataLoader",
-      "args": {
-        "data_split_filename": "data_split.json"
-      }
-    },
-    {
-      "id": "metrics_writer",
-      "path": "nvflare.app_opt.tracking.tb.tb_writer.TBWriter",
-      "args": {"event_type": "analytix_log_stats"}
-    },
-    {
-      "id": "event_to_fed",
-      "path": "nvflare.app_common.widgets.convert_to_fed_event.ConvertToFedEvent",
-      "args": {"events_to_convert": ["analytix_log_stats"], "fed_event_prefix": "fed."}
-    }
-  ]
-}
diff --git a/examples/advanced/xgboost/histogram-based/jobs/base/app/config/config_fed_server.json b/examples/advanced/xgboost/histogram-based/jobs/base/app/config/config_fed_server.json
deleted file mode 100755
index 9814f32e2c..0000000000
--- a/examples/advanced/xgboost/histogram-based/jobs/base/app/config/config_fed_server.json
+++ /dev/null
@@ -1,23 +0,0 @@
-{
-  "format_version": 2,
-  "task_data_filters": [],
-  "task_result_filters": [],
-  "components": [
-    {
-      "id": "tb_receiver",
-      "path": "nvflare.app_opt.tracking.tb.tb_receiver.TBAnalyticsReceiver",
-      "args": {
-        "tb_folder": "tb_events"
-      }
-    }
-  ],
-  "workflows": [
-    {
-      "id": "xgb_controller",
-      "path": "nvflare.app_opt.xgboost.histogram_based.controller.XGBFedController",
-      "args": {
-        "train_timeout": 30000
-      }
-    }
-  ]
-}
\ No newline at end of file
diff --git a/examples/advanced/xgboost/histogram-based/jobs/base/meta.json b/examples/advanced/xgboost/histogram-based/jobs/base/meta.json
deleted file mode 100644
index 68fc7c42e0..0000000000
--- a/examples/advanced/xgboost/histogram-based/jobs/base/meta.json
+++ /dev/null
@@ -1,10 +0,0 @@
-{
-  "name": "xgboost_histogram_based",
-  "resource_spec": {},
-  "deploy_map": {
-    "app": [
-      "@ALL"
-    ]
-  },
-  "min_clients": 2
-}
diff --git a/examples/advanced/xgboost/histogram-based/jobs/base_v2/app/config/config_fed_client.json b/examples/advanced/xgboost/histogram-based/jobs/base_v2/app/config/config_fed_client.json
deleted file mode 100755
index a23a960c3d..0000000000
--- a/examples/advanced/xgboost/histogram-based/jobs/base_v2/app/config/config_fed_client.json
+++ /dev/null
@@ -1,39 +0,0 @@
-{
-  "format_version": 2,
-  "executors": [
-    {
-      "tasks": [
-        "config", "start"
-      ],
-      "executor": {
-        "id": "Executor",
-        "path": "nvflare.app_opt.xgboost.histogram_based_v2.fed_executor.FedXGBHistogramExecutor",
-        "args": {
-          "data_loader_id": "dataloader",
-          "metrics_writer_id": "metrics_writer"
-        }
-      }
-    }
-  ],
-  "task_result_filters": [],
-  "task_data_filters": [],
-  "components": [
-    {
-      "id": "dataloader",
-      "path": "higgs_data_loader.HIGGSDataLoader",
-      "args": {
-        "data_split_filename": "data_split.json"
-      }
-    },
-    {
-      "id": "metrics_writer",
-      "path": "nvflare.app_opt.tracking.tb.tb_writer.TBWriter",
-      "args": {"event_type": "analytix_log_stats"}
-    },
-    {
-      "id": "event_to_fed",
-      "path": "nvflare.app_common.widgets.convert_to_fed_event.ConvertToFedEvent",
-      "args": {"events_to_convert": ["analytix_log_stats"], "fed_event_prefix": "fed."}
-    }
-  ]
-}
diff --git a/examples/advanced/xgboost/histogram-based/jobs/base_v2/app/config/config_fed_server.json b/examples/advanced/xgboost/histogram-based/jobs/base_v2/app/config/config_fed_server.json
deleted file mode 100755
index d0dd1e3908..0000000000
--- a/examples/advanced/xgboost/histogram-based/jobs/base_v2/app/config/config_fed_server.json
+++ /dev/null
@@ -1,37 +0,0 @@
-{
-  "format_version": 2,
-  "num_rounds": 100,
-  "task_data_filters": [],
-  "task_result_filters": [],
-  "components": [
-    {
-      "id": "tb_receiver",
-      "path": "nvflare.app_opt.tracking.tb.tb_receiver.TBAnalyticsReceiver",
-      "args": {
-        "tb_folder": "tb_events"
-      }
-    }
-  ],
-  "workflows": [
-    {
-      "id": "xgb_controller",
-      "path": "nvflare.app_opt.xgboost.histogram_based_v2.fed_controller.XGBFedController",
-      "args": {
-        "num_rounds": "{num_rounds}",
-        "data_split_mode": 0,
-        "secure_training": false,
-        "xgb_params": {
-          "max_depth": 8,
-          "eta": 0.1,
-          "objective": "binary:logistic",
-          "eval_metric": "auc",
-          "tree_method": "hist",
-          "nthread": 16
-        },
-        "xgb_options": {
-          "early_stopping_rounds": 2
-        }
-      }
-    }
-  ]
-}
\ No newline at end of file
diff --git a/examples/advanced/xgboost/histogram-based/jobs/base_v2/app/custom/higgs_data_loader.py b/examples/advanced/xgboost/histogram-based/jobs/base_v2/app/custom/higgs_data_loader.py
deleted file mode 100644
index 6623e35fa3..0000000000
--- a/examples/advanced/xgboost/histogram-based/jobs/base_v2/app/custom/higgs_data_loader.py
+++ /dev/null
@@ -1,77 +0,0 @@
-# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-import json
-
-import pandas as pd
-import xgboost as xgb
-
-from nvflare.app_opt.xgboost.data_loader import XGBDataLoader
-
-
-def _read_higgs_with_pandas(data_path, start: int, end: int):
-    data_size = end - start
-    data = pd.read_csv(data_path, header=None, skiprows=start, nrows=data_size)
-    data_num = data.shape[0]
-
-    # split to feature and label
-    x = data.iloc[:, 1:].copy()
-    y = data.iloc[:, 0].copy()
-
-    return x, y, data_num
-
-
-class HIGGSDataLoader(XGBDataLoader):
-    def __init__(self, data_split_filename):
-        """Reads HIGGS dataset and return XGB data matrix.
-
-        Args:
-            data_split_filename: file name to data splits
-        """
-        self.data_split_filename = data_split_filename
-
-    def load_data(self):
-        with open(self.data_split_filename, "r") as file:
-            data_split = json.load(file)
-
-        data_path = data_split["data_path"]
-        data_index = data_split["data_index"]
-
-        # check if site_id and "valid" in the mapping dict
-        if self.client_id not in data_index.keys():
-            raise ValueError(
-                f"Data does not contain Client {self.client_id} split",
-            )
-
-        if "valid" not in data_index.keys():
-            raise ValueError(
-                "Data does not contain Validation split",
-            )
-
-        site_index = data_index[self.client_id]
-        valid_index = data_index["valid"]
-
-        # training
-        x_train, y_train, total_train_data_num = _read_higgs_with_pandas(
-            data_path=data_path, start=site_index["start"], end=site_index["end"]
-        )
-        dmat_train = xgb.DMatrix(x_train, label=y_train)
-
-        # validation
-        x_valid, y_valid, total_valid_data_num = _read_higgs_with_pandas(
-            data_path=data_path, start=valid_index["start"], end=valid_index["end"]
-        )
-        dmat_valid = xgb.DMatrix(x_valid, label=y_valid, data_split_mode=self.data_split_mode)
-
-        return dmat_train, dmat_valid
diff --git a/examples/advanced/xgboost/histogram-based/jobs/base_v2/meta.json b/examples/advanced/xgboost/histogram-based/jobs/base_v2/meta.json
deleted file mode 100644
index 6d82211a16..0000000000
--- a/examples/advanced/xgboost/histogram-based/jobs/base_v2/meta.json
+++ /dev/null
@@ -1,10 +0,0 @@
-{
-  "name": "xgboost_histogram_based_v2",
-  "resource_spec": {},
-  "deploy_map": {
-    "app": [
-      "@ALL"
-    ]
-  },
-  "min_clients": 2
-}
diff --git a/examples/advanced/xgboost/histogram-based/prepare_data.sh b/examples/advanced/xgboost/histogram-based/prepare_data.sh
deleted file mode 100755
index f7bdf9e68d..0000000000
--- a/examples/advanced/xgboost/histogram-based/prepare_data.sh
+++ /dev/null
@@ -1,5 +0,0 @@
-#!/usr/bin/env bash
-
-SCRIPT_DIR="$( dirname -- "$0"; )";
-
-bash "${SCRIPT_DIR}"/../prepare_data.sh
diff --git a/examples/advanced/xgboost/histogram-based/requirements.txt b/examples/advanced/xgboost/histogram-based/requirements.txt
deleted file mode 100644
index d79a5bef89..0000000000
--- a/examples/advanced/xgboost/histogram-based/requirements.txt
+++ /dev/null
@@ -1,9 +0,0 @@
-nvflare~=2.5.0rc
-pandas
-scikit-learn
-torch
-tensorboard
-matplotlib
-shap
-# require xgboost 2.2 version, for now need to install a nightly build
-https://s3-us-west-2.amazonaws.com/xgboost-nightly-builds/federated-secure/xgboost-2.2.0.dev0%2B4601688195708f7c31fcceeb0e0ac735e7311e61-py3-none-manylinux_2_28_x86_64.whl
diff --git a/examples/advanced/xgboost/histogram-based/run_experiment_centralized.sh b/examples/advanced/xgboost/histogram-based/run_experiment_centralized.sh
deleted file mode 100755
index 7a71f2d0a8..0000000000
--- a/examples/advanced/xgboost/histogram-based/run_experiment_centralized.sh
+++ /dev/null
@@ -1,9 +0,0 @@
-#!/usr/bin/env bash
-DATASET_PATH="$HOME/dataset/HIGGS.csv"
-
-if [ ! -f "${DATASET_PATH}" ]
-then
-    echo "Please check if you saved HIGGS dataset in ${DATASET_PATH}"
-    exit 1
-fi
-python3 ../utils/baseline_centralized.py --num_parallel_tree 1 --train_in_one_session --data_path "${DATASET_PATH}"
diff --git a/examples/advanced/xgboost/histogram-based/run_experiment_simulator.sh b/examples/advanced/xgboost/histogram-based/run_experiment_simulator.sh
deleted file mode 100755
index eb6861c326..0000000000
--- a/examples/advanced/xgboost/histogram-based/run_experiment_simulator.sh
+++ /dev/null
@@ -1,9 +0,0 @@
-#!/usr/bin/env bash
-
-n=2
-study=histogram_uniform_split_uniform_lr
-nvflare simulator jobs/higgs_${n}_${study} -w ${PWD}/workspaces/xgboost_workspace_${n}_${study} -n ${n} -t ${n}
-
-n=5
-study=histogram_uniform_split_uniform_lr
-nvflare simulator jobs/higgs_${n}_${study} -w ${PWD}/workspaces/xgboost_workspace_${n}_${study} -n ${n} -t ${n}
diff --git a/examples/advanced/xgboost/prepare_data.sh b/examples/advanced/xgboost/prepare_data.sh
deleted file mode 100755
index f1a2e28675..0000000000
--- a/examples/advanced/xgboost/prepare_data.sh
+++ /dev/null
@@ -1,25 +0,0 @@
-#!/usr/bin/env bash
-DATASET_PATH="$HOME/dataset/HIGGS.csv"
-OUTPUT_PATH="/tmp/nvflare/xgboost_higgs_dataset"
-SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )
-
-if [ ! -f "${DATASET_PATH}" ]
-then
-    echo "Please check if you saved HIGGS dataset in ${DATASET_PATH}"
-fi
-
-echo "Generated HIGGS data splits, reading from ${DATASET_PATH}"
-for site_num in 2 5 20;
-do
-    for split_mode in uniform exponential square;
-    do
-        python3 ${SCRIPT_DIR}/utils/prepare_data_split.py \
-        --data_path "${DATASET_PATH}" \
-        --site_num ${site_num} \
-        --size_total 11000000 \
-        --size_valid 1000000 \
-        --split_method ${split_mode} \
-        --out_path "${OUTPUT_PATH}/${site_num}_${split_mode}"
-    done
-done
-echo "Data splits are generated in ${OUTPUT_PATH}"
diff --git a/examples/advanced/xgboost/prepare_job_config.sh b/examples/advanced/xgboost/prepare_job_config.sh
deleted file mode 100755
index f839b46242..0000000000
--- a/examples/advanced/xgboost/prepare_job_config.sh
+++ /dev/null
@@ -1,26 +0,0 @@
-#!/usr/bin/env bash
-TREE_METHOD="hist"
-
-prepare_job_config() {
-    python3 utils/prepare_job_config.py --site_num "$1" --training_algo "$2" --split_method "$3" \
-    --lr_mode "$4" --nthread 16 --tree_method "$5"
-}
-
-echo "Generating job configs"
-prepare_job_config 5 bagging exponential scaled $TREE_METHOD
-prepare_job_config 5 bagging exponential uniform $TREE_METHOD
-prepare_job_config 5 bagging uniform uniform $TREE_METHOD
-prepare_job_config 5 cyclic exponential uniform $TREE_METHOD
-prepare_job_config 5 cyclic uniform uniform $TREE_METHOD
-
-prepare_job_config 20 bagging square scaled $TREE_METHOD
-prepare_job_config 20 bagging square uniform $TREE_METHOD
-prepare_job_config 20 bagging uniform uniform $TREE_METHOD
-prepare_job_config 20 cyclic square uniform $TREE_METHOD
-prepare_job_config 20 cyclic uniform uniform $TREE_METHOD
-
-prepare_job_config 2 histogram uniform uniform $TREE_METHOD
-prepare_job_config 5 histogram uniform uniform $TREE_METHOD
-prepare_job_config 2 histogram_v2 uniform uniform $TREE_METHOD
-prepare_job_config 5 histogram_v2 uniform uniform $TREE_METHOD
-echo "Job configs generated"
diff --git a/examples/advanced/xgboost_secure/requirements.txt b/examples/advanced/xgboost/requirements.txt
similarity index 90%
rename from examples/advanced/xgboost_secure/requirements.txt
rename to examples/advanced/xgboost/requirements.txt
index 2d9890c2c6..95cbefd2e9 100644
--- a/examples/advanced/xgboost_secure/requirements.txt
+++ b/examples/advanced/xgboost/requirements.txt
@@ -1,10 +1,12 @@
-nvflare~=2.5.0rc
-ipcl_python @ git+https://github.com/intel/pailliercryptolib_python.git@development
-# require xgboost 2.2 version, for now need to install a nightly build
-https://s3-us-west-2.amazonaws.com/xgboost-nightly-builds/federated-secure/xgboost-2.2.0.dev0%2B4601688195708f7c31fcceeb0e0ac735e7311e61-py3-none-manylinux_2_28_x86_64.whl
+nvflare~=2.5.0
+openmined.psi==1.1.1
 pandas
+torch
 scikit-learn
 shap
 matplotlib
 tensorboard
 tenseal
+# require xgboost 2.2 version, for now need to install a nightly build
+https://s3-us-west-2.amazonaws.com/xgboost-nightly-builds/federated-secure/xgboost-2.2.0.dev0%2B4601688195708f7c31fcceeb0e0ac735e7311e61-py3-none-manylinux_2_28_x86_64.whl
+ipcl_python @ git+https://github.com/intel/pailliercryptolib_python.git@development
diff --git a/examples/advanced/xgboost/tree-based/README.md b/examples/advanced/xgboost/tree-based/README.md
deleted file mode 100644
index ddcb545d09..0000000000
--- a/examples/advanced/xgboost/tree-based/README.md
+++ /dev/null
@@ -1,101 +0,0 @@
-# Tree-based Federated Learning for XGBoost   
-
-You can also follow along in this [notebook](./xgboost_tree_higgs.ipynb) for an interactive experience.
-
-## Cyclic Training 
-
-"Cyclic XGBoost" is one way of performing tree-based federated boosting with multiple sites: at each round of tree boosting, instead of relying on the whole data statistics collected from all clients, the boosting relies on only 1 client's local data. The resulting tree sequence is then forwarded to the next client for next round's boosting. Such training scheme have been proposed in literatures [1] [2].
-
-## Bagging Aggregation
-
-"Bagging XGBoost" is another way of performing tree-based federated boosting with multiple sites: at each round of tree boosting, all sites start from the same "global model", and boost a number of trees (in current example, 1 tree) based on their local data. The resulting trees are then send to server. A bagging aggregation scheme is applied to all the submitted trees to update the global model, which is further distributed to all clients for next round's boosting. 
-
-This scheme bears certain similarity to the [Random Forest mode](https://xgboost.readthedocs.io/en/stable/tutorials/rf.html) of XGBoost, where a `num_parallel_tree` is boosted based on random row/col splits, rather than a single tree. Under federated learning setting, such split is fixed to clients rather than random and without column subsampling. 
-
-In addition to basic uniform shrinkage setting where all clients have the same learning rate, based on our research, we enabled scaled shrinkage across clients for weighted aggregation according to each client's data size, which is shown to significantly improve the model's performance on non-uniform quantity splits over HIGGS data.
-
-## Run automated experiments
-Please make sure to finish the [preparation steps](../README.md) before running the following steps.
-To run all experiments in this example with NVFlare, follow the steps below. To try out a single experiment, follow this [notebook](./xgboost_tree_higgs.ipynb).
-
-### Environment Preparation
-
-Switch to this directory and install additional requirements (suggest to do this inside virtual environment):
-```
-python3 -m pip install -r requirements.txt
-```
-
-### Run federated experiments with simulator locally
-Next, we will use the NVFlare simulator to run FL training for all the different experiment configurations.
-```
-bash run_experiment_simulator.sh
-```
-
-### Run centralized experiments
-For comparison, we train baseline models in a centralized manner with same round of training.
-```
-bash run_experiment_centralized.sh
-```
-This will train several models w/ and w/o random forest settings. The results are shown below.
-
-![Centralized validation curve](./figs/Centralized.png)
-
-As shown, random forest may not yield significant performance gain,
-and can even make the accuracy worse if subsample rate is too low (e.g. 0.05).
-
-### Results comparison on 5-client and 20-client under various training settings
-
-Let's then summarize the result of the federated learning experiments run above. We compare the AUC scores of 
-the model on a standalone validation set consisted of the first 1 million instances of HIGGS dataset.
-
-We provide a script for plotting the tensorboard records, running
-```
-python3 ./utils/plot_tensorboard_events.py
-```
-
-> **_NOTE:_** You need to install [./plot-requirements.txt](./plot-requirements.txt) to plot.
-
-
-The resulting validation AUC curves (no smoothing) are shown below:
-
-![5 clients validation curve](./figs/5_client.png)
-![20 clients validation curve](./figs/20_client.png)
-
-As illustrated, we can have the following observations:
-- cyclic training performs ok under uniform split (the purple curve), however under non-uniform split, it will have significant performance drop (the brown curve)
-- bagging training performs better than cyclic under both uniform and non-uniform data splits (orange v.s. purple, red/green v.s. brown)
-- with uniform shrinkage, bagging will have significant performance drop under non-uniform split (green v.s. orange)
-- data-size dependent shrinkage will be able to recover the performance drop above (red v.s. green), and achieve comparable/better performance as uniform data split (red v.s. orange) 
-- bagging under uniform data split (orange), and bagging with data-size dependent shrinkage under non-uniform data split(red), can achieve comparable/better performance as compared with centralized training baseline (blue)
-
-For model size, centralized training and cyclic training will have a model consisting of `num_round` trees,
-while the bagging models consist of `num_round * num_client` trees, since each round,
-bagging training boosts a forest consisting of individually trained trees from each client.
-
-### Run federated experiments in real world
-
-To run in a federated setting, follow [Real-World FL](https://nvflare.readthedocs.io/en/main/real_world_fl.html) to
-start the overseer, FL servers and FL clients.
-
-You need to download the HIGGS data on each client site.
-You will also need to install the xgboost on each client site and server site.
-
-You can still generate the data splits and job configs using the scripts provided.
-
-You will need to copy the generated data split file into each client site.
-You might also need to modify the `data_path` in the `data_site-XXX.json`
-inside the `/tmp/nvflare/xgboost_higgs_dataset` folder,
-since each site might save the HIGGS dataset in different places.
-
-Then you can use admin client to submit the job via `submit_job` command.
-
-## Customization
-
-To use other dataset, can inherit the base class `XGBDataLoader` and
-implement that `load_data()` method.
-
-
-## Reference
-[1] Zhao, L. et al., "InPrivate Digging: Enabling Tree-based Distributed Data Mining with Differential Privacy," IEEE INFOCOM 2018 - IEEE Conference on Computer Communications, 2018, pp. 2087-2095
-
-[2] Yamamoto, F. et al., "New Approaches to Federated XGBoost Learning for Privacy-Preserving Data Analysis," ICONIP 2020 - International Conference on Neural Information Processing, 2020, Lecture Notes in Computer Science, vol 12533 
diff --git a/examples/advanced/xgboost/tree-based/jobs/bagging_base/app/config/config_fed_client.json b/examples/advanced/xgboost/tree-based/jobs/bagging_base/app/config/config_fed_client.json
deleted file mode 100755
index ef0f19875b..0000000000
--- a/examples/advanced/xgboost/tree-based/jobs/bagging_base/app/config/config_fed_client.json
+++ /dev/null
@@ -1,41 +0,0 @@
-{
-  "format_version": 2,
-
-  "executors": [
-    {
-      "tasks": [
-        "train"
-      ],
-      "executor": {
-        "id": "Executor",
-        "path": "nvflare.app_opt.xgboost.tree_based.executor.FedXGBTreeExecutor",
-        "args": {
-          "data_loader_id": "dataloader",
-          "training_mode": "bagging",
-          "num_client_bagging": 5,
-          "num_local_parallel_tree": 1,
-          "local_subsample": 1,
-          "local_model_path": "model.json",
-          "global_model_path": "model_global.json",
-          "learning_rate": 0.1,
-          "objective": "binary:logistic",
-          "max_depth": 8,
-          "eval_metric": "auc",
-          "tree_method": "hist",
-          "nthread": 16
-        }
-      }
-    }
-  ],
-  "task_result_filters": [],
-  "task_data_filters": [],
-  "components": [
-    {
-      "id": "dataloader",
-      "path": "higgs_data_loader.HIGGSDataLoader",
-      "args": {
-        "data_split_filename": "data_split.json"
-      }
-    }
-  ]
-}
diff --git a/examples/advanced/xgboost/tree-based/jobs/bagging_base/app/config/config_fed_server.json b/examples/advanced/xgboost/tree-based/jobs/bagging_base/app/config/config_fed_server.json
deleted file mode 100755
index cfd7b83b54..0000000000
--- a/examples/advanced/xgboost/tree-based/jobs/bagging_base/app/config/config_fed_server.json
+++ /dev/null
@@ -1,48 +0,0 @@
-{
-  "format_version": 2,
-  "num_rounds": 101,
-
-  "task_data_filters": [],
-  "task_result_filters": [],
-
-  "components": [
-    {
-      "id": "persistor",
-      "path": "nvflare.app_opt.xgboost.tree_based.model_persistor.XGBModelPersistor",
-      "args": {
-        "save_name": "xgboost_model.json"
-      }
-    },
-    {
-      "id": "shareable_generator",
-      "path": "nvflare.app_opt.xgboost.tree_based.shareable_generator.XGBModelShareableGenerator",
-      "args": {}
-    },
-    {
-      "id": "aggregator",
-      "path": "nvflare.app_opt.xgboost.tree_based.bagging_aggregator.XGBBaggingAggregator",
-      "args": {}
-    }
-  ],
-  "workflows": [
-    {
-      "id": "scatter_and_gather",
-      "path": "nvflare.app_common.workflows.scatter_and_gather.ScatterAndGather",
-      "args": {
-        "min_clients": 5,
-        "num_rounds": "{num_rounds}",
-        "start_round": 0,
-        "wait_time_after_min_received": 0,
-        "aggregator_id": "aggregator",
-        "persistor_id": "persistor",
-        "shareable_generator_id": "shareable_generator",
-        "train_task_name": "train",
-        "train_timeout": 0,
-        "allow_empty_global_weights": true,
-        "task_check_period": 0.01,
-        "persist_every_n_rounds": 0,
-        "snapshot_every_n_rounds": 0
-      }
-    }
-  ]
-}
diff --git a/examples/advanced/xgboost/tree-based/jobs/bagging_base/app/custom/higgs_data_loader.py b/examples/advanced/xgboost/tree-based/jobs/bagging_base/app/custom/higgs_data_loader.py
deleted file mode 100644
index 124268cfce..0000000000
--- a/examples/advanced/xgboost/tree-based/jobs/bagging_base/app/custom/higgs_data_loader.py
+++ /dev/null
@@ -1,77 +0,0 @@
-# Copyright (c) 2023, NVIDIA CORPORATION.  All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-import json
-
-import pandas as pd
-import xgboost as xgb
-
-from nvflare.app_opt.xgboost.data_loader import XGBDataLoader
-
-
-def _read_higgs_with_pandas(data_path, start: int, end: int):
-    data_size = end - start
-    data = pd.read_csv(data_path, header=None, skiprows=start, nrows=data_size)
-    data_num = data.shape[0]
-
-    # split to feature and label
-    x = data.iloc[:, 1:].copy()
-    y = data.iloc[:, 0].copy()
-
-    return x, y, data_num
-
-
-class HIGGSDataLoader(XGBDataLoader):
-    def __init__(self, data_split_filename):
-        """Reads HIGGS dataset and return XGB data matrix.
-
-        Args:
-            data_split_filename: file name to data splits
-        """
-        self.data_split_filename = data_split_filename
-
-    def load_data(self):
-        with open(self.data_split_filename, "r") as file:
-            data_split = json.load(file)
-
-        data_path = data_split["data_path"]
-        data_index = data_split["data_index"]
-
-        # check if site_id and "valid" in the mapping dict
-        if self.client_id not in data_index.keys():
-            raise ValueError(
-                f"Data does not contain Client {self.client_id} split",
-            )
-
-        if "valid" not in data_index.keys():
-            raise ValueError(
-                "Data does not contain Validation split",
-            )
-
-        site_index = data_index[self.client_id]
-        valid_index = data_index["valid"]
-
-        # training
-        x_train, y_train, total_train_data_num = _read_higgs_with_pandas(
-            data_path=data_path, start=site_index["start"], end=site_index["end"]
-        )
-        dmat_train = xgb.DMatrix(x_train, label=y_train)
-
-        # validation
-        x_valid, y_valid, total_valid_data_num = _read_higgs_with_pandas(
-            data_path=data_path, start=valid_index["start"], end=valid_index["end"]
-        )
-        dmat_valid = xgb.DMatrix(x_valid, label=y_valid, data_split_mode=self.data_split_mode)
-
-        return dmat_train, dmat_valid
diff --git a/examples/advanced/xgboost/tree-based/jobs/bagging_base/meta.json b/examples/advanced/xgboost/tree-based/jobs/bagging_base/meta.json
deleted file mode 100644
index aa7ac49fd6..0000000000
--- a/examples/advanced/xgboost/tree-based/jobs/bagging_base/meta.json
+++ /dev/null
@@ -1,9 +0,0 @@
-{
-  "name": "xgboost_tree_bagging",
-  "resource_spec": {},
-  "deploy_map": {
-    "app": [
-      "@ALL"
-    ]
-  }
-}
diff --git a/examples/advanced/xgboost/tree-based/jobs/cyclic_base/app/config/config_fed_client.json b/examples/advanced/xgboost/tree-based/jobs/cyclic_base/app/config/config_fed_client.json
deleted file mode 100755
index d63a3ea551..0000000000
--- a/examples/advanced/xgboost/tree-based/jobs/cyclic_base/app/config/config_fed_client.json
+++ /dev/null
@@ -1,39 +0,0 @@
-{
-  "format_version": 2,
-
-  "executors": [
-    {
-      "tasks": [
-        "train"
-      ],
-      "executor": {
-        "id": "Executor",
-        "path": "nvflare.app_opt.xgboost.tree_based.executor.FedXGBTreeExecutor",
-        "args": {
-          "data_loader_id": "dataloader",
-          "training_mode": "cyclic",
-          "num_client_bagging": 1,
-          "local_model_path": "model.json",
-          "global_model_path": "model_global.json",
-          "learning_rate": 0.1,
-          "objective": "binary:logistic",
-          "max_depth": 8,
-          "eval_metric": "auc",
-          "tree_method": "hist",
-          "nthread": 16
-        }
-      }
-    }
-  ],
-  "task_result_filters": [],
-  "task_data_filters": [],
-  "components": [
-    {
-      "id": "dataloader",
-      "path": "higgs_data_loader.HIGGSDataLoader",
-      "args": {
-        "data_split_filename": "data_split.json"
-      }
-    }
-  ]
-}
diff --git a/examples/advanced/xgboost/tree-based/jobs/cyclic_base/app/config/config_fed_server.json b/examples/advanced/xgboost/tree-based/jobs/cyclic_base/app/config/config_fed_server.json
deleted file mode 100755
index 93a8e3cf4b..0000000000
--- a/examples/advanced/xgboost/tree-based/jobs/cyclic_base/app/config/config_fed_server.json
+++ /dev/null
@@ -1,38 +0,0 @@
-{
-  "format_version": 2,
-  "num_rounds": 20,
-  "task_data_filters": [],
-  "task_result_filters": [],
-
-  "components": [
-    {
-      "id": "persistor",
-      "path": "nvflare.app_opt.xgboost.tree_based.model_persistor.XGBModelPersistor",
-      "args": {
-        "save_name": "xgboost_model.json",
-        "load_as_dict": false
-      }
-    },
-    {
-      "id": "shareable_generator",
-      "path": "nvflare.app_opt.xgboost.tree_based.shareable_generator.XGBModelShareableGenerator",
-      "args": {}
-    }
-  ],
-  "workflows": [
-    {
-      "id": "cyclic_ctl",
-      "path": "nvflare.app_common.workflows.cyclic_ctl.CyclicController",
-      "args": {
-        "num_rounds": "{num_rounds}",
-        "task_assignment_timeout": 60,
-        "persistor_id": "persistor",
-        "shareable_generator_id": "shareable_generator",
-        "task_name": "train",
-        "task_check_period": 0.01,
-        "persist_every_n_rounds": 0,
-        "snapshot_every_n_rounds": 0
-      }
-    }
-  ]
-}
diff --git a/examples/advanced/xgboost/tree-based/jobs/cyclic_base/app/custom/higgs_data_loader.py b/examples/advanced/xgboost/tree-based/jobs/cyclic_base/app/custom/higgs_data_loader.py
deleted file mode 100644
index 124268cfce..0000000000
--- a/examples/advanced/xgboost/tree-based/jobs/cyclic_base/app/custom/higgs_data_loader.py
+++ /dev/null
@@ -1,77 +0,0 @@
-# Copyright (c) 2023, NVIDIA CORPORATION.  All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-import json
-
-import pandas as pd
-import xgboost as xgb
-
-from nvflare.app_opt.xgboost.data_loader import XGBDataLoader
-
-
-def _read_higgs_with_pandas(data_path, start: int, end: int):
-    data_size = end - start
-    data = pd.read_csv(data_path, header=None, skiprows=start, nrows=data_size)
-    data_num = data.shape[0]
-
-    # split to feature and label
-    x = data.iloc[:, 1:].copy()
-    y = data.iloc[:, 0].copy()
-
-    return x, y, data_num
-
-
-class HIGGSDataLoader(XGBDataLoader):
-    def __init__(self, data_split_filename):
-        """Reads HIGGS dataset and return XGB data matrix.
-
-        Args:
-            data_split_filename: file name to data splits
-        """
-        self.data_split_filename = data_split_filename
-
-    def load_data(self):
-        with open(self.data_split_filename, "r") as file:
-            data_split = json.load(file)
-
-        data_path = data_split["data_path"]
-        data_index = data_split["data_index"]
-
-        # check if site_id and "valid" in the mapping dict
-        if self.client_id not in data_index.keys():
-            raise ValueError(
-                f"Data does not contain Client {self.client_id} split",
-            )
-
-        if "valid" not in data_index.keys():
-            raise ValueError(
-                "Data does not contain Validation split",
-            )
-
-        site_index = data_index[self.client_id]
-        valid_index = data_index["valid"]
-
-        # training
-        x_train, y_train, total_train_data_num = _read_higgs_with_pandas(
-            data_path=data_path, start=site_index["start"], end=site_index["end"]
-        )
-        dmat_train = xgb.DMatrix(x_train, label=y_train)
-
-        # validation
-        x_valid, y_valid, total_valid_data_num = _read_higgs_with_pandas(
-            data_path=data_path, start=valid_index["start"], end=valid_index["end"]
-        )
-        dmat_valid = xgb.DMatrix(x_valid, label=y_valid, data_split_mode=self.data_split_mode)
-
-        return dmat_train, dmat_valid
diff --git a/examples/advanced/xgboost/tree-based/jobs/cyclic_base/meta.json b/examples/advanced/xgboost/tree-based/jobs/cyclic_base/meta.json
deleted file mode 100644
index 58450dbfdd..0000000000
--- a/examples/advanced/xgboost/tree-based/jobs/cyclic_base/meta.json
+++ /dev/null
@@ -1,9 +0,0 @@
-{
-  "name": "xgboost_tree_cyclic",
-  "resource_spec": {},
-  "deploy_map": {
-    "app": [
-      "@ALL"
-    ]
-  }
-}
diff --git a/examples/advanced/xgboost/tree-based/plot-requirements.txt b/examples/advanced/xgboost/tree-based/plot-requirements.txt
deleted file mode 100644
index 7262e63060..0000000000
--- a/examples/advanced/xgboost/tree-based/plot-requirements.txt
+++ /dev/null
@@ -1,2 +0,0 @@
-tensorflow
-seaborn
diff --git a/examples/advanced/xgboost/tree-based/prepare_data.sh b/examples/advanced/xgboost/tree-based/prepare_data.sh
deleted file mode 100755
index f7bdf9e68d..0000000000
--- a/examples/advanced/xgboost/tree-based/prepare_data.sh
+++ /dev/null
@@ -1,5 +0,0 @@
-#!/usr/bin/env bash
-
-SCRIPT_DIR="$( dirname -- "$0"; )";
-
-bash "${SCRIPT_DIR}"/../prepare_data.sh
diff --git a/examples/advanced/xgboost/tree-based/requirements.txt b/examples/advanced/xgboost/tree-based/requirements.txt
deleted file mode 100644
index d79a5bef89..0000000000
--- a/examples/advanced/xgboost/tree-based/requirements.txt
+++ /dev/null
@@ -1,9 +0,0 @@
-nvflare~=2.5.0rc
-pandas
-scikit-learn
-torch
-tensorboard
-matplotlib
-shap
-# require xgboost 2.2 version, for now need to install a nightly build
-https://s3-us-west-2.amazonaws.com/xgboost-nightly-builds/federated-secure/xgboost-2.2.0.dev0%2B4601688195708f7c31fcceeb0e0ac735e7311e61-py3-none-manylinux_2_28_x86_64.whl
diff --git a/examples/advanced/xgboost/tree-based/run_experiment_centralized.sh b/examples/advanced/xgboost/tree-based/run_experiment_centralized.sh
deleted file mode 100755
index 83cfa81162..0000000000
--- a/examples/advanced/xgboost/tree-based/run_experiment_centralized.sh
+++ /dev/null
@@ -1,13 +0,0 @@
-#!/usr/bin/env bash
-DATASET_PATH="$HOME/dataset/HIGGS.csv"
-
-if [ ! -f "${DATASET_PATH}" ]
-then
-    echo "Please check if you saved HIGGS dataset in ${DATASET_PATH}"
-fi
-
-python3 ../utils/baseline_centralized.py --num_parallel_tree 1 --data_path "${DATASET_PATH}"
-python3 ../utils/baseline_centralized.py --num_parallel_tree 5 --subsample 0.8 --data_path "${DATASET_PATH}"
-python3 ../utils/baseline_centralized.py --num_parallel_tree 5 --subsample 0.2 --data_path "${DATASET_PATH}"
-python3 ../utils/baseline_centralized.py --num_parallel_tree 20 --subsample 0.05 --data_path "${DATASET_PATH}"
-python3 ../utils/baseline_centralized.py --num_parallel_tree 20 --subsample 0.8 --data_path "${DATASET_PATH}"
diff --git a/examples/advanced/xgboost/tree-based/run_experiment_simulator.sh b/examples/advanced/xgboost/tree-based/run_experiment_simulator.sh
deleted file mode 100755
index 05b2a050e7..0000000000
--- a/examples/advanced/xgboost/tree-based/run_experiment_simulator.sh
+++ /dev/null
@@ -1,22 +0,0 @@
-#!/usr/bin/env bash
-
-n=5
-for study in bagging_uniform_split_uniform_lr \
-             bagging_exponential_split_uniform_lr \
-             bagging_exponential_split_scaled_lr \
-             cyclic_uniform_split_uniform_lr \
-             cyclic_exponential_split_uniform_lr
-do
-  nvflare simulator jobs/higgs_${n}_${study} -w ${PWD}/workspaces/xgboost_workspace_${n}_${study} -n ${n} -t ${n}
-done
-
-
-n=20
-for study in bagging_uniform_split_uniform_lr \
-            bagging_square_split_uniform_lr \
-            bagging_square_split_scaled_lr \
-            cyclic_uniform_split_uniform_lr \
-            cyclic_square_split_uniform_lr
-do
-  nvflare simulator jobs/higgs_${n}_${study} -w ${PWD}/workspaces/xgboost_workspace_${n}_${study} -n ${n} -t ${n}
-done
diff --git a/examples/advanced/xgboost/tree-based/utils/plot_tensorboard_events.py b/examples/advanced/xgboost/tree-based/utils/plot_tensorboard_events.py
deleted file mode 100644
index bc6953f274..0000000000
--- a/examples/advanced/xgboost/tree-based/utils/plot_tensorboard_events.py
+++ /dev/null
@@ -1,136 +0,0 @@
-# Copyright (c) 2023, NVIDIA CORPORATION.  All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-import glob
-import os
-
-import matplotlib.pyplot as plt
-import seaborn as sns
-import tensorflow as tf
-
-# simulator workspace
-client_results_root = "./workspaces/xgboost_workspace_"
-client_num_list = [5, 20]
-client_pre = "app_site-"
-centralized_path = "./workspaces/centralized_1_1/events.*"
-
-# bagging and cyclic need different handle
-experiments_bagging = {
-    5: {
-        "5_bagging_uniform_split_uniform_lr": {"tag": "AUC"},
-        "5_bagging_exponential_split_uniform_lr": {"tag": "AUC"},
-        "5_bagging_exponential_split_scaled_lr": {"tag": "AUC"},
-    },
-    20: {
-        "20_bagging_uniform_split_uniform_lr": {"tag": "AUC"},
-        "20_bagging_square_split_uniform_lr": {"tag": "AUC"},
-        "20_bagging_square_split_scaled_lr": {"tag": "AUC"},
-    },
-}
-experiments_cyclic = {
-    5: {
-        "5_cyclic_uniform_split_uniform_lr": {"tag": "AUC"},
-        "5_cyclic_exponential_split_uniform_lr": {"tag": "AUC"},
-    },
-    20: {
-        "20_cyclic_uniform_split_uniform_lr": {"tag": "AUC"},
-        "20_cyclic_square_split_uniform_lr": {"tag": "AUC"},
-    },
-}
-
-weight = 0.0
-
-
-def smooth(scalars, weight):  # Weight between 0 and 1
-    last = scalars[0]  # First value in the plot (first timestep)
-    smoothed = list()
-    for point in scalars:
-        smoothed_val = last * weight + (1 - weight) * point  # Calculate smoothed value
-        smoothed.append(smoothed_val)  # Save it
-        last = smoothed_val  # Anchor the last smoothed value
-    return smoothed
-
-
-def read_eventfile(filepath, tags=["AUC"]):
-    data = {}
-    for summary in tf.compat.v1.train.summary_iterator(filepath):
-        for v in summary.summary.value:
-            if v.tag in tags:
-                if v.tag in data.keys():
-                    data[v.tag].append([summary.step, v.simple_value])
-                else:
-                    data[v.tag] = [[summary.step, v.simple_value]]
-    return data
-
-
-def add_eventdata(data, config, filepath, tag="AUC"):
-    event_data = read_eventfile(filepath, tags=[tag])
-    assert len(event_data[tag]) > 0, f"No data for key {tag}"
-
-    metric = []
-    for e in event_data[tag]:
-        # print(e)
-        data["Config"].append(config)
-        data["Round"].append(e[0])
-        metric.append(e[1])
-
-    metric = smooth(metric, weight)
-    for entry in metric:
-        data["AUC"].append(entry)
-
-    print(f"added {len(event_data[tag])} entries for {tag}")
-
-
-def main():
-    plt.figure()
-
-    for client_num in client_num_list:
-        plt.figure
-        plt.title(f"{client_num} client experiments")
-        # add event files
-        data = {"Config": [], "Round": [], "AUC": []}
-        # add centralized result
-        eventfile = glob.glob(centralized_path, recursive=True)
-        assert len(eventfile) == 1, "No unique event file found!" + eventfile
-        eventfile = eventfile[0]
-        print("adding", eventfile)
-        add_eventdata(data, "centralized", eventfile, tag="AUC")
-        # pick first client for bagging experiments
-        site = 1
-        for config, exp in experiments_bagging[client_num].items():
-            record_path = os.path.join(client_results_root + config, "simulate_job", client_pre + str(site), "events.*")
-            eventfile = glob.glob(record_path, recursive=True)
-            assert len(eventfile) == 1, "No unique event file found!"
-            eventfile = eventfile[0]
-            print("adding", eventfile)
-            add_eventdata(data, config, eventfile, tag=exp["tag"])
-
-        # Combine all clients' records for cyclic experiments
-        for site in range(1, client_num + 1):
-            for config, exp in experiments_cyclic[client_num].items():
-                record_path = os.path.join(
-                    client_results_root + config, "simulate_job", client_pre + str(site), "events.*"
-                )
-                eventfile = glob.glob(record_path, recursive=True)
-                assert len(eventfile) == 1, f"No unique event file found under {record_path}!"
-                eventfile = eventfile[0]
-                print("adding", eventfile)
-                add_eventdata(data, config, eventfile, tag=exp["tag"])
-
-        sns.lineplot(x="Round", y="AUC", hue="Config", data=data)
-        plt.show()
-
-
-if __name__ == "__main__":
-    main()
diff --git a/examples/advanced/xgboost/utils/prepare_job_config.py b/examples/advanced/xgboost/utils/prepare_job_config.py
deleted file mode 100644
index c7339391ab..0000000000
--- a/examples/advanced/xgboost/utils/prepare_job_config.py
+++ /dev/null
@@ -1,239 +0,0 @@
-# Copyright (c) 2023, NVIDIA CORPORATION.  All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-import argparse
-import json
-import os
-import pathlib
-import shutil
-
-from nvflare.apis.fl_constant import JobConstants
-
-SCRIPT_PATH = pathlib.Path(os.path.realpath(__file__))
-XGB_EXAMPLE_ROOT = SCRIPT_PATH.parent.parent.absolute()
-JOB_CONFIGS_ROOT = "jobs"
-ALGO_DIR_MAP = {
-    "bagging": "tree-based",
-    "cyclic": "tree-based",
-    "histogram": "histogram-based",
-    "histogram_v2": "histogram-based",
-}
-BASE_JOB_MAP = {"bagging": "bagging_base", "cyclic": "cyclic_base", "histogram": "base", "histogram_v2": "base_v2"}
-
-
-def job_config_args_parser():
-    parser = argparse.ArgumentParser(description="generate train configs for HIGGS dataset")
-    parser.add_argument(
-        "--data_root",
-        type=str,
-        default="/tmp/nvflare/xgboost_higgs_dataset",
-        help="Path to dataset config files for each site",
-    )
-    parser.add_argument("--site_num", type=int, default=5, help="Total number of sites")
-    parser.add_argument("--site_name_prefix", type=str, default="site-", help="Site name prefix")
-    parser.add_argument("--round_num", type=int, default=100, help="Total number of training rounds")
-    parser.add_argument(
-        "--training_algo", type=str, default="bagging", choices=list(ALGO_DIR_MAP.keys()), help="Training algorithm"
-    )
-    parser.add_argument("--split_method", type=str, default="uniform", help="How to split the dataset")
-    parser.add_argument("--lr_mode", type=str, default="uniform", help="Whether to use uniform or scaled shrinkage")
-    parser.add_argument("--nthread", type=int, default=16, help="nthread for xgboost")
-    parser.add_argument(
-        "--tree_method", type=str, default="hist", help="tree_method for xgboost - use hist for best perf"
-    )
-    parser.add_argument("--data_split_mode", type=int, default=0, help="dataset split mode, 0 or 1")
-    parser.add_argument("--secure_training", type=bool, default=False, help="histogram_v2 secure training or not")
-    return parser
-
-
-def _read_json(filename):
-    if not os.path.isfile(filename):
-        raise ValueError(f"{filename} does not exist!")
-    with open(filename, "r") as f:
-        return json.load(f)
-
-
-def _write_json(data, filename):
-    with open(filename, "w") as f:
-        json.dump(data, f, indent=4)
-
-
-def _get_job_name(args) -> str:
-    return (
-        "higgs_"
-        + str(args.site_num)
-        + "_"
-        + args.training_algo
-        + "_"
-        + args.split_method
-        + "_split"
-        + "_"
-        + args.lr_mode
-        + "_lr"
-    )
-
-
-def _get_data_split_name(args, site_name: str) -> str:
-    return os.path.join(args.data_root, f"{args.site_num}_{args.split_method}", f"data_{site_name}.json")
-
-
-def _get_src_job_dir(training_algo):
-    return XGB_EXAMPLE_ROOT / ALGO_DIR_MAP[training_algo] / JOB_CONFIGS_ROOT / BASE_JOB_MAP[training_algo]
-
-
-def _gen_deploy_map(num_sites: int, site_name_prefix: str) -> dict:
-    deploy_map = {"app_server": ["server"]}
-    for i in range(1, num_sites + 1):
-        deploy_map[f"app_{site_name_prefix}{i}"] = [f"{site_name_prefix}{i}"]
-    return deploy_map
-
-
-def _update_meta(meta: dict, args):
-    name = _get_job_name(args)
-    meta["name"] = name
-    meta["deploy_map"] = _gen_deploy_map(args.site_num, args.site_name_prefix)
-    meta["min_clients"] = args.site_num
-
-
-def _get_lr_scale_from_split_json(data_split: dict):
-    split = {}
-    total_data_num = 0
-    for k, v in data_split["data_index"].items():
-        if k == "valid":
-            continue
-        data_num = int(v["end"] - v["start"])
-        total_data_num += data_num
-        split[k] = data_num
-
-    lr_scales = {}
-    for k in split:
-        lr_scales[k] = split[k] / total_data_num
-
-    return lr_scales
-
-
-def _update_client_config(config: dict, args, lr_scale, site_name: str):
-    data_split_name = _get_data_split_name(args, site_name)
-    if args.training_algo == "bagging" or args.training_algo == "cyclic":
-        # update client config
-        config["executors"][0]["executor"]["args"]["lr_scale"] = lr_scale
-        config["executors"][0]["executor"]["args"]["lr_mode"] = args.lr_mode
-        config["executors"][0]["executor"]["args"]["nthread"] = args.nthread
-        config["executors"][0]["executor"]["args"]["tree_method"] = args.tree_method
-        config["executors"][0]["executor"]["args"]["training_mode"] = args.training_algo
-        num_client_bagging = 1
-        if args.training_algo == "bagging":
-            num_client_bagging = args.site_num
-        config["executors"][0]["executor"]["args"]["num_client_bagging"] = num_client_bagging
-    elif args.training_algo == "histogram":
-        config["num_rounds"] = args.round_num
-        config["executors"][0]["executor"]["args"]["xgb_params"]["nthread"] = args.nthread
-        config["executors"][0]["executor"]["args"]["xgb_params"]["tree_method"] = args.tree_method
-    config["components"][0]["args"]["data_split_filename"] = data_split_name
-
-
-def _update_server_config(config: dict, args):
-    if args.training_algo == "bagging":
-        config["num_rounds"] = args.round_num + 1
-        config["workflows"][0]["args"]["min_clients"] = args.site_num
-    elif args.training_algo == "cyclic":
-        config["num_rounds"] = int(args.round_num / args.site_num)
-    elif args.training_algo == "histogram_v2":
-        config["num_rounds"] = args.round_num
-        config["workflows"][0]["args"]["xgb_params"]["nthread"] = args.nthread
-        config["workflows"][0]["args"]["xgb_params"]["tree_method"] = args.tree_method
-        config["workflows"][0]["args"]["data_split_mode"] = args.data_split_mode
-        config["workflows"][0]["args"]["secure_training"] = args.secure_training
-
-
-def _copy_custom_files(src_job_path, src_app_name, dst_job_path, dst_app_name):
-    dst_path = dst_job_path / dst_app_name / "custom"
-    os.makedirs(dst_path, exist_ok=True)
-    src_path = src_job_path / src_app_name / "custom"
-    if os.path.isdir(src_path):
-        shutil.copytree(src_path, dst_path, dirs_exist_ok=True)
-
-
-def create_server_app(src_job_path, src_app_name, dst_job_path, site_name, args):
-    dst_app_name = f"app_{site_name}"
-    server_config = _read_json(src_job_path / src_app_name / "config" / JobConstants.SERVER_JOB_CONFIG)
-    dst_config_path = dst_job_path / dst_app_name / "config"
-
-    # make target config folders
-    if not os.path.exists(dst_config_path):
-        os.makedirs(dst_config_path)
-
-    _update_server_config(server_config, args)
-    server_config_filename = dst_config_path / JobConstants.SERVER_JOB_CONFIG
-    _write_json(server_config, server_config_filename)
-
-
-def create_client_app(src_job_path, src_app_name, dst_job_path, site_name, args):
-    dst_app_name = f"app_{site_name}"
-    client_config = _read_json(src_job_path / src_app_name / "config" / JobConstants.CLIENT_JOB_CONFIG)
-    dst_config_path = dst_job_path / dst_app_name / "config"
-
-    # make target config folders
-    if not os.path.exists(dst_config_path):
-        os.makedirs(dst_config_path)
-
-    # get lr scale
-    data_split_name = _get_data_split_name(args, site_name)
-    data_split = _read_json(data_split_name)
-    lr_scales = _get_lr_scale_from_split_json(data_split)
-
-    # adjust file contents according to each job's specs
-    _update_client_config(client_config, args, lr_scales[site_name], site_name)
-    client_config_filename = dst_config_path / JobConstants.CLIENT_JOB_CONFIG
-    _write_json(client_config, client_config_filename)
-
-    # copy custom file
-    _copy_custom_files(src_job_path, src_app_name, dst_job_path, dst_app_name)
-
-
-def main():
-    parser = job_config_args_parser()
-    args = parser.parse_args()
-    job_name = _get_job_name(args)
-    src_job_path = _get_src_job_dir(args.training_algo)
-
-    # create a new job
-    dst_job_path = XGB_EXAMPLE_ROOT / ALGO_DIR_MAP[args.training_algo] / JOB_CONFIGS_ROOT / job_name
-    if not os.path.exists(dst_job_path):
-        os.makedirs(dst_job_path)
-
-    # update meta
-    meta_config_dst = dst_job_path / JobConstants.META_FILE
-    meta_config = _read_json(src_job_path / JobConstants.META_FILE)
-    _update_meta(meta_config, args)
-    _write_json(meta_config, meta_config_dst)
-
-    # create server side app
-    create_server_app(
-        src_job_path=src_job_path, src_app_name="app", dst_job_path=dst_job_path, site_name="server", args=args
-    )
-
-    # create client side app
-    for i in range(1, args.site_num + 1):
-        create_client_app(
-            src_job_path=src_job_path,
-            src_app_name="app",
-            dst_job_path=dst_job_path,
-            site_name=f"{args.site_name_prefix}{i}",
-            args=args,
-        )
-
-
-if __name__ == "__main__":
-    main()
diff --git a/examples/hello-world/hello-flower/README.md b/examples/hello-world/hello-flower/README.md
index 6ef847ad54..3ec8d96cc5 100644
--- a/examples/hello-world/hello-flower/README.md
+++ b/examples/hello-world/hello-flower/README.md
@@ -23,7 +23,10 @@ If you haven't already, we recommend creating a virtual environment.
 python3 -m venv nvflare_flwr
 source nvflare_flwr/bin/activate
 ```
-
+We recommend installing an older version of NumPy as torch/torchvision doesn't support NumPy 2 at this time.
+```bash
+pip install numpy==1.26.4
+```
 ## 2.1 Run a simulation
 
 To run flwr-pt job with NVFlare, we first need to install its dependencies.
@@ -49,3 +52,23 @@ the TensorBoard metrics to the server at each iteration using NVFlare's metric s
 ```bash
 python job.py --job_name "flwr-pt-tb" --content_dir "./flwr-pt-tb" --stream_metrics
 ```
+
+You can visualize the metrics streamed to the server using TensorBoard.
+```bash
+tensorboard --logdir /tmp/nvflare/hello-flower
+```
+![tensorboard training curve](./train.png)
+
+## Notes
+Make sure your `pyproject.toml` files in the Flower apps contain an "address" field. This needs to be present as the `--federation-config` option of the `flwr run` command tries to override the `“address”` field.
+Your `pyproject.toml` should include a section similar to this:
+```
+[tool.flwr.federations]
+default = "xxx"
+
+[tool.flwr.federations.xxx]
+options.num-supernodes = 2
+address = "127.0.0.1:9093"
+insecure = false
+```
+The number `options.num-supernodes` should match the number of NVFlare clients defined in [job.py](./job.py), e.g., `job.simulator_run(args.workdir, gpu="0", n_clients=2)`.
diff --git a/examples/hello-world/hello-flower/flwr-pt-tb/flwr_pt_tb/client.py b/examples/hello-world/hello-flower/flwr-pt-tb/flwr_pt_tb/client.py
index 35f6668483..ad3f836e54 100644
--- a/examples/hello-world/hello-flower/flwr-pt-tb/flwr_pt_tb/client.py
+++ b/examples/hello-world/hello-flower/flwr-pt-tb/flwr_pt_tb/client.py
@@ -28,8 +28,6 @@
 # initializes NVFlare interface
 from nvflare.client.tracking import SummaryWriter
 
-flare.init()
-
 
 # Define FlowerClient and client_fn
 class FlowerClient(NumPyClient):
@@ -81,3 +79,15 @@ def client_fn(context: Context):
 app = ClientApp(
     client_fn=client_fn,
 )
+
+
+@app.enter()
+def enter(ctxt: Context) -> None:
+    flare.init()
+    print("ClientApp entering. Flare initialized.")
+
+
+@app.exit()
+def exit(ctxt: Context) -> None:
+    flare.shutdown()
+    print("ClientApp exiting. Flare shutdown.")
diff --git a/examples/hello-world/hello-flower/flwr-pt-tb/pyproject.toml b/examples/hello-world/hello-flower/flwr-pt-tb/pyproject.toml
index 12b99c5c63..3bbe943217 100644
--- a/examples/hello-world/hello-flower/flwr-pt-tb/pyproject.toml
+++ b/examples/hello-world/hello-flower/flwr-pt-tb/pyproject.toml
@@ -8,8 +8,8 @@ version = "1.0.0"
 description = ""
 license = "Apache-2.0"
 dependencies = [
-    "flwr[simulation]>=1.11.0,<2.0",
-    "nvflare~=2.5.0rc",
+    "flwr[simulation]>=1.15.2,<2.0",
+    "nvflare~=2.6.0rc",
     "torch==2.2.1",
     "torchvision==0.17.1",
     "tensorboard"
@@ -33,3 +33,5 @@ default = "local-simulation"
 
 [tool.flwr.federations.local-simulation]
 options.num-supernodes = 2
+address = "127.0.0.1:9093"
+insecure = true
\ No newline at end of file
diff --git a/examples/hello-world/hello-flower/flwr-pt/pyproject.toml b/examples/hello-world/hello-flower/flwr-pt/pyproject.toml
index 8624601c1b..2aa6c813e7 100644
--- a/examples/hello-world/hello-flower/flwr-pt/pyproject.toml
+++ b/examples/hello-world/hello-flower/flwr-pt/pyproject.toml
@@ -8,10 +8,11 @@ version = "1.0.0"
 description = ""
 license = "Apache-2.0"
 dependencies = [
-    "flwr[simulation]>=1.11.0,<2.0",
-    "nvflare~=2.5.0rc",
+    "flwr[simulation]>=1.15.2,<2.0",
+    "nvflare~=2.6.0rc",
     "torch==2.2.1",
     "torchvision==0.17.1",
+    "tensorboard"
 ]
 
 [tool.hatch.build.targets.wheel]
@@ -32,3 +33,5 @@ default = "local-simulation"
 
 [tool.flwr.federations.local-simulation]
 options.num-supernodes = 2
+address = "127.0.0.1:9093"
+insecure = true
\ No newline at end of file
diff --git a/examples/hello-world/hello-flower/train.png b/examples/hello-world/hello-flower/train.png
new file mode 100644
index 0000000000..7f4a1ebda6
Binary files /dev/null and b/examples/hello-world/hello-flower/train.png differ
diff --git a/examples/tutorials/self-paced-training/part-1_federated_learning_introduction/chapter-2_develop_federated_learning_applications/02.1_federated_statistics/federated_statistics_introduction.ipynb b/examples/tutorials/self-paced-training/part-1_federated_learning_introduction/chapter-2_develop_federated_learning_applications/02.1_federated_statistics/federated_statistics_introduction.ipynb
new file mode 100644
index 0000000000..5fcc41dfed
--- /dev/null
+++ b/examples/tutorials/self-paced-training/part-1_federated_learning_introduction/chapter-2_develop_federated_learning_applications/02.1_federated_statistics/federated_statistics_introduction.ipynb
@@ -0,0 +1,23 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Federated Statistics Introduction\n",
+    "\n",
+    "In a federated learning setting, because data is private at each site and we need to ensure data privacy, there are many considerations to take into account when trying to gather statistics on the data. We provide two examples, one for image data and one for tabular data:\n",
+    "\n",
+    "   * [Federated Statistics with image data](./federated_statistics_with_image_data/federated_statistics_with_image_data.ipynb) shows how to compute local and global image statistics with the consideration that data is private at each of the client sites.\n",
+    "   * [Federated Statistics with tabular data](./federated_statistics_with_tabular_data/federated_statistics_with_tabular_data.ipynb) demonstrates how to create federated statistics for data that can be represented as Pandas DataFrames."
+   ]
+  }
+ ],
+ "metadata": {
+  "language_info": {
+   "name": "python"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/examples/tutorials/self-paced-training/part-1_federated_learning_introduction/chapter-2_develop_federated_learning_applications/02.2_convert_torch_lightning_to_federated_learning/convert_torch_lightning_to_fl.ipynb b/examples/tutorials/self-paced-training/part-1_federated_learning_introduction/chapter-2_develop_federated_learning_applications/02.2_convert_torch_lightning_to_federated_learning/convert_torch_lightning_to_fl.ipynb
index 0693c707c7..71f9a334c6 100644
--- a/examples/tutorials/self-paced-training/part-1_federated_learning_introduction/chapter-2_develop_federated_learning_applications/02.2_convert_torch_lightning_to_federated_learning/convert_torch_lightning_to_fl.ipynb
+++ b/examples/tutorials/self-paced-training/part-1_federated_learning_introduction/chapter-2_develop_federated_learning_applications/02.2_convert_torch_lightning_to_federated_learning/convert_torch_lightning_to_fl.ipynb
@@ -424,7 +424,7 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
+   "display_name": "Python 3",
    "language": "python",
    "name": "python3"
   },
@@ -438,7 +438,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.10.14"
+   "version": "3.10.12"
   }
  },
  "nbformat": 4,
diff --git a/examples/tutorials/self-paced-training/part-1_federated_learning_introduction/chapter-2_develop_federated_learning_applications/02.3_convert_machine_learning_to_federated_learning/02.3.2_convert_kmeans_to_federated_learning/code/figs/minibatch.png b/examples/tutorials/self-paced-training/part-1_federated_learning_introduction/chapter-2_develop_federated_learning_applications/02.3_convert_machine_learning_to_federated_learning/02.3.2_convert_kmeans_to_federated_learning/code/figs/minibatch.png
new file mode 100644
index 0000000000..6c18b663a3
Binary files /dev/null and b/examples/tutorials/self-paced-training/part-1_federated_learning_introduction/chapter-2_develop_federated_learning_applications/02.3_convert_machine_learning_to_federated_learning/02.3.2_convert_kmeans_to_federated_learning/code/figs/minibatch.png differ
diff --git a/examples/tutorials/self-paced-training/part-1_federated_learning_introduction/chapter-2_develop_federated_learning_applications/02.3_convert_machine_learning_to_federated_learning/02.3.2_convert_kmeans_to_federated_learning/code/kmeans_job.py b/examples/tutorials/self-paced-training/part-1_federated_learning_introduction/chapter-2_develop_federated_learning_applications/02.3_convert_machine_learning_to_federated_learning/02.3.2_convert_kmeans_to_federated_learning/code/kmeans_job.py
new file mode 100644
index 0000000000..be7a31795a
--- /dev/null
+++ b/examples/tutorials/self-paced-training/part-1_federated_learning_introduction/chapter-2_develop_federated_learning_applications/02.3_convert_machine_learning_to_federated_learning/02.3.2_convert_kmeans_to_federated_learning/code/kmeans_job.py
@@ -0,0 +1,240 @@
+# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import argparse
+import os
+from enum import Enum
+from typing import List
+
+import numpy as np
+from src.kmeans_assembler import KMeansAssembler
+from src.kmeans_learner import KMeansLearner
+
+from nvflare import FedJob
+from nvflare.app_common.aggregators.collect_and_assemble_aggregator import CollectAndAssembleAggregator
+from nvflare.app_common.shareablegenerators.full_model_shareable_generator import FullModelShareableGenerator
+from nvflare.app_common.workflows.scatter_and_gather import ScatterAndGather
+from nvflare.app_opt.sklearn.joblib_model_param_persistor import JoblibModelParamPersistor
+from nvflare.app_opt.sklearn.sklearn_executor import SKLearnExecutor
+
+
+class SplitMethod(Enum):
+    UNIFORM = "uniform"
+    LINEAR = "linear"
+    SQUARE = "square"
+    EXPONENTIAL = "exponential"
+
+
+def get_split_ratios(site_num: int, split_method: SplitMethod):
+    if split_method == SplitMethod.UNIFORM:
+        ratio_vec = np.ones(site_num)
+    elif split_method == SplitMethod.LINEAR:
+        ratio_vec = np.linspace(1, site_num, num=site_num)
+    elif split_method == SplitMethod.SQUARE:
+        ratio_vec = np.square(np.linspace(1, site_num, num=site_num))
+    elif split_method == SplitMethod.EXPONENTIAL:
+        ratio_vec = np.exp(np.linspace(1, site_num, num=site_num))
+    else:
+        raise ValueError(f"Split method {split_method.name} not implemented!")
+
+    return ratio_vec
+
+
+def split_num_proportion(n, site_num, split_method: SplitMethod) -> List[int]:
+    split = []
+    ratio_vec = get_split_ratios(site_num, split_method)
+    total = sum(ratio_vec)
+    left = n
+    for site in range(site_num - 1):
+        x = int(n * ratio_vec[site] / total)
+        left = left - x
+        split.append(x)
+    split.append(left)
+    return split
+
+
+def assign_data_index_to_sites(
+    data_size: int,
+    valid_fraction: float,
+    num_sites: int,
+    split_method: SplitMethod = SplitMethod.UNIFORM,
+) -> dict:
+    if valid_fraction > 1.0:
+        raise ValueError("validation percent should be less than or equal to 100% of the total data")
+    elif valid_fraction < 1.0:
+        valid_size = int(round(data_size * valid_fraction, 0))
+        train_size = data_size - valid_size
+    else:
+        valid_size = data_size
+        train_size = data_size
+
+    site_sizes = split_num_proportion(train_size, num_sites, split_method)
+    split_data_indices = {
+        "valid": {"start": 0, "end": valid_size},
+    }
+    for site in range(num_sites):
+        site_id = site + 1
+        if valid_fraction < 1.0:
+            idx_start = valid_size + sum(site_sizes[:site])
+            idx_end = valid_size + sum(site_sizes[: site + 1])
+        else:
+            idx_start = sum(site_sizes[:site])
+            idx_end = sum(site_sizes[: site + 1])
+        split_data_indices[site_id] = {"start": idx_start, "end": idx_end}
+
+    return split_data_indices
+
+
+def get_file_line_count(input_path: str) -> int:
+    count = 0
+    with open(input_path, "r") as fp:
+        for i, _ in enumerate(fp):
+            count += 1
+    return count
+
+
+def split_data(
+    data_path: str,
+    num_clients: int,
+    valid_frac: float,
+    split_method: SplitMethod = SplitMethod.UNIFORM,
+):
+    size_total_file = get_file_line_count(data_path)
+    site_indices = assign_data_index_to_sites(size_total_file, valid_frac, num_clients, split_method)
+    return site_indices
+
+
+def define_parser():
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--workspace_dir",
+        type=str,
+        default="/tmp/nvflare/workspace/works/kmeans",
+        help="work directory, default to '/tmp/nvflare/workspace/works/kmeans'",
+    )
+    parser.add_argument(
+        "--job_dir",
+        type=str,
+        default="/tmp/nvflare/workspace/jobs/kmeans",
+        help="directory for job export, default to '/tmp/nvflare/workspace/jobs/kmeans'",
+    )
+    parser.add_argument(
+        "--data_path",
+        type=str,
+        default="/tmp/nvflare/dataset/sklearn_iris.csv",
+        help="work directory, default to '/tmp/nvflare/dataset/sklearn_iris.csv'",
+    )
+    parser.add_argument(
+        "--num_clients",
+        type=int,
+        default=3,
+        help="number of clients to simulate, default to 3",
+    )
+    parser.add_argument(
+        "--num_rounds",
+        type=int,
+        default=5,
+        help="number of rounds, default to 5",
+    )
+    parser.add_argument(
+        "--split_mode",
+        type=str,
+        default="uniform",
+        choices=["uniform", "linear", "square", "exponential"],
+        help="how to split data among clients",
+    )
+    parser.add_argument(
+        "--valid_frac",
+        type=float,
+        default=1,
+        help="fraction of data to use for validation, default to perform validation on all data",
+    )
+    return parser.parse_args()
+
+
+def main():
+    args = define_parser()
+    # Get args
+    data_path = args.data_path
+    num_clients = args.num_clients
+    num_rounds = args.num_rounds
+    split_mode = args.split_mode
+    valid_frac = args.valid_frac
+    job_name = f"sklearn_kmeans_{split_mode}_{num_clients}_clients"
+
+    # Set the output workspace and job directories
+    workspace_dir = os.path.join(args.workspace_dir, job_name)
+    job_dir = args.job_dir
+
+    # Create the FedJob
+    job = FedJob(name=job_name, min_clients=num_clients)
+
+    # Define the controller workflow and send to server
+    controller = ScatterAndGather(
+        min_clients=num_clients,
+        num_rounds=num_rounds,
+        aggregator_id="aggregator",
+        persistor_id="persistor",
+        shareable_generator_id="shareable_generator",
+        train_task_name="train",
+    )
+    job.to_server(controller, id="scatter_and_gather")
+
+    # Define other server components
+    assembler = KMeansAssembler()
+    job.to_server(assembler, id="kmeans_assembler")
+    aggregator = CollectAndAssembleAggregator(assembler_id="kmeans_assembler")
+    job.to_server(aggregator, id="aggregator")
+    shareable_generator = FullModelShareableGenerator()
+    job.to_server(shareable_generator, id="shareable_generator")
+    persistor = JoblibModelParamPersistor(
+        initial_params={"n_clusters": 3},
+    )
+    job.to_server(persistor, id="persistor")
+
+    # Get the data split numbers and send to each client
+    # generate data split
+    site_indices = split_data(
+        data_path,
+        num_clients,
+        valid_frac,
+        SplitMethod(split_mode),
+    )
+
+    for i in range(1, num_clients + 1):
+        # Define the executor and send to clients
+        runner = SKLearnExecutor(learner_id="kmeans_learner")
+        job.to(runner, f"site-{i}", tasks=["train"])
+
+        learner = KMeansLearner(
+            data_path=data_path,
+            train_start=site_indices[i]["start"],
+            train_end=site_indices[i]["end"],
+            valid_start=site_indices["valid"]["start"],
+            valid_end=site_indices["valid"]["end"],
+            random_state=0,
+        )
+        job.to(learner, f"site-{i}", id="kmeans_learner")
+
+    # Export the job
+    print("job_dir=", job_dir)
+    job.export_job(job_dir)
+
+    # Run the job
+    print("workspace_dir=", workspace_dir)
+    job.simulator_run(workspace_dir)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/examples/tutorials/self-paced-training/part-1_federated_learning_introduction/chapter-2_develop_federated_learning_applications/02.3_convert_machine_learning_to_federated_learning/02.3.2_convert_kmeans_to_federated_learning/code/requirements.txt b/examples/tutorials/self-paced-training/part-1_federated_learning_introduction/chapter-2_develop_federated_learning_applications/02.3_convert_machine_learning_to_federated_learning/02.3.2_convert_kmeans_to_federated_learning/code/requirements.txt
new file mode 100644
index 0000000000..b72d5c2798
--- /dev/null
+++ b/examples/tutorials/self-paced-training/part-1_federated_learning_introduction/chapter-2_develop_federated_learning_applications/02.3_convert_machine_learning_to_federated_learning/02.3.2_convert_kmeans_to_federated_learning/code/requirements.txt
@@ -0,0 +1,4 @@
+pandas
+scikit-learn
+joblib
+tensorboard
diff --git a/examples/tutorials/self-paced-training/part-1_federated_learning_introduction/chapter-2_develop_federated_learning_applications/02.3_convert_machine_learning_to_federated_learning/02.3.2_convert_kmeans_to_federated_learning/code/src/kmeans_assembler.py b/examples/tutorials/self-paced-training/part-1_federated_learning_introduction/chapter-2_develop_federated_learning_applications/02.3_convert_machine_learning_to_federated_learning/02.3.2_convert_kmeans_to_federated_learning/code/src/kmeans_assembler.py
new file mode 100644
index 0000000000..23e6fdc62e
--- /dev/null
+++ b/examples/tutorials/self-paced-training/part-1_federated_learning_introduction/chapter-2_develop_federated_learning_applications/02.3_convert_machine_learning_to_federated_learning/02.3.2_convert_kmeans_to_federated_learning/code/src/kmeans_assembler.py
@@ -0,0 +1,75 @@
+# Copyright (c) 2023, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from typing import Dict
+
+import numpy as np
+from sklearn.cluster import KMeans
+
+from nvflare.apis.dxo import DXO, DataKind
+from nvflare.apis.fl_context import FLContext
+from nvflare.app_common.aggregators.assembler import Assembler
+from nvflare.app_common.app_constant import AppConstants
+
+
+class KMeansAssembler(Assembler):
+    def __init__(self):
+        super().__init__(data_kind=DataKind.WEIGHTS)
+        # Aggregator needs to keep record of historical
+        # center and count information for mini-batch kmeans
+        self.center = None
+        self.count = None
+        self.n_cluster = 0
+
+    def get_model_params(self, dxo: DXO):
+        data = dxo.data
+        return {"center": data["center"], "count": data["count"]}
+
+    def assemble(self, data: Dict[str, dict], fl_ctx: FLContext) -> DXO:
+        current_round = fl_ctx.get_prop(AppConstants.CURRENT_ROUND)
+        if current_round == 0:
+            # First round, collect the information regarding n_feature and n_cluster
+            # Initialize the aggregated center and count to all zero
+            client_0 = list(self.collection.keys())[0]
+            self.n_cluster = self.collection[client_0]["center"].shape[0]
+            n_feature = self.collection[client_0]["center"].shape[1]
+            self.center = np.zeros([self.n_cluster, n_feature])
+            self.count = np.zeros([self.n_cluster])
+            # perform one round of KMeans over the submitted centers
+            # to be used as the original center points
+            # no count for this round
+            center_collect = []
+            for _, record in self.collection.items():
+                center_collect.append(record["center"])
+            centers = np.concatenate(center_collect)
+            kmeans_center_initial = KMeans(n_clusters=self.n_cluster)
+            kmeans_center_initial.fit(centers)
+            self.center = kmeans_center_initial.cluster_centers_
+        else:
+            # Mini-batch k-Means step to assemble the received centers
+            for center_idx in range(self.n_cluster):
+                centers_global_rescale = self.center[center_idx] * self.count[center_idx]
+                # Aggregate center, add new center to previous estimate, weighted by counts
+                for _, record in self.collection.items():
+                    centers_global_rescale += record["center"][center_idx] * record["count"][center_idx]
+                    self.count[center_idx] += record["count"][center_idx]
+                # Rescale to compute mean of all points (old and new combined)
+                alpha = 1 / self.count[center_idx]
+                centers_global_rescale *= alpha
+                # Update the global center
+                self.center[center_idx] = centers_global_rescale
+        params = {"center": self.center}
+        dxo = DXO(data_kind=self.expected_data_kind, data=params)
+
+        return dxo
diff --git a/examples/tutorials/self-paced-training/part-1_federated_learning_introduction/chapter-2_develop_federated_learning_applications/02.3_convert_machine_learning_to_federated_learning/02.3.2_convert_kmeans_to_federated_learning/code/src/kmeans_learner.py b/examples/tutorials/self-paced-training/part-1_federated_learning_introduction/chapter-2_develop_federated_learning_applications/02.3_convert_machine_learning_to_federated_learning/02.3.2_convert_kmeans_to_federated_learning/code/src/kmeans_learner.py
new file mode 100644
index 0000000000..61c96a5abe
--- /dev/null
+++ b/examples/tutorials/self-paced-training/part-1_federated_learning_introduction/chapter-2_develop_federated_learning_applications/02.3_convert_machine_learning_to_federated_learning/02.3.2_convert_kmeans_to_federated_learning/code/src/kmeans_learner.py
@@ -0,0 +1,116 @@
+# Copyright (c) 2023, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from typing import Optional, Tuple
+
+from sklearn.cluster import KMeans, MiniBatchKMeans, kmeans_plusplus
+from sklearn.metrics import homogeneity_score
+
+from nvflare.apis.fl_context import FLContext
+from nvflare.app_common.abstract.learner_spec import Learner
+from nvflare.app_opt.sklearn.data_loader import load_data_for_range
+
+
+class KMeansLearner(Learner):
+    def __init__(
+        self,
+        data_path: str,
+        train_start: int,
+        train_end: int,
+        valid_start: int,
+        valid_end: int,
+        random_state: int = None,
+        max_iter: int = 1,
+        n_init: int = 1,
+        reassignment_ratio: int = 0,
+    ):
+        super().__init__()
+        self.data_path = data_path
+        self.train_start = train_start
+        self.train_end = train_end
+        self.valid_start = valid_start
+        self.valid_end = valid_end
+
+        self.random_state = random_state
+        self.max_iter = max_iter
+        self.n_init = n_init
+        self.reassignment_ratio = reassignment_ratio
+        self.train_data = None
+        self.valid_data = None
+        self.n_samples = None
+        self.n_clusters = None
+
+    def load_data(self) -> dict:
+        train_data = load_data_for_range(self.data_path, self.train_start, self.train_end)
+        valid_data = load_data_for_range(self.data_path, self.valid_start, self.valid_end)
+        return {"train": train_data, "valid": valid_data}
+
+    def initialize(self, parts: dict, fl_ctx: FLContext):
+        data = self.load_data()
+        self.train_data = data["train"]
+        self.valid_data = data["valid"]
+        # train data size, to be used for setting
+        # NUM_STEPS_CURRENT_ROUND for potential use in aggregation
+        self.n_samples = data["train"][-1]
+        # note that the model needs to be created every round
+        # due to the available API for center initialization
+
+    def train(self, curr_round: int, global_param: Optional[dict], fl_ctx: FLContext) -> Tuple[dict, dict]:
+        # get training data, note that clustering is unsupervised
+        # so only x_train will be used
+        (x_train, y_train, train_size) = self.train_data
+        if curr_round == 0:
+            # first round, compute initial center with kmeans++ method
+            # model will be None for this round
+            self.n_clusters = global_param["n_clusters"]
+            center_local, _ = kmeans_plusplus(x_train, n_clusters=self.n_clusters, random_state=self.random_state)
+            kmeans = None
+            params = {"center": center_local, "count": None}
+        else:
+            center_global = global_param["center"]
+            # following rounds, local training starting from global center
+            kmeans = MiniBatchKMeans(
+                n_clusters=self.n_clusters,
+                batch_size=self.n_samples,
+                max_iter=self.max_iter,
+                init=center_global,
+                n_init=self.n_init,
+                reassignment_ratio=self.reassignment_ratio,
+                random_state=self.random_state,
+            )
+            kmeans.fit(x_train)
+            center_local = kmeans.cluster_centers_
+            count_local = kmeans._counts
+            params = {"center": center_local, "count": count_local}
+        return params, kmeans
+
+    def validate(self, curr_round: int, global_param: Optional[dict], fl_ctx: FLContext) -> Tuple[dict, dict]:
+        # local validation with global center
+        # fit a standalone KMeans with just the given center
+        center_global = global_param["center"]
+        kmeans_global = KMeans(n_clusters=self.n_clusters, init=center_global, n_init=1)
+        kmeans_global.fit(center_global)
+        # get validation data, both x and y will be used
+        (x_valid, y_valid, valid_size) = self.valid_data
+        y_pred = kmeans_global.predict(x_valid)
+        homo = homogeneity_score(y_valid, y_pred)
+        self.log_info(fl_ctx, f"Homogeneity {homo:.4f}")
+        metrics = {"Homogeneity": homo}
+        return metrics, kmeans_global
+
+    def finalize(self, fl_ctx: FLContext) -> None:
+        # freeing resources in finalize
+        del self.train_data
+        del self.valid_data
+        self.log_info(fl_ctx, "Freed training resources")
diff --git a/examples/tutorials/self-paced-training/part-1_federated_learning_introduction/chapter-2_develop_federated_learning_applications/02.3_convert_machine_learning_to_federated_learning/02.3.2_convert_kmeans_to_federated_learning/code/utils/prepare_data.py b/examples/tutorials/self-paced-training/part-1_federated_learning_introduction/chapter-2_develop_federated_learning_applications/02.3_convert_machine_learning_to_federated_learning/02.3.2_convert_kmeans_to_federated_learning/code/utils/prepare_data.py
new file mode 100644
index 0000000000..cfc12462d1
--- /dev/null
+++ b/examples/tutorials/self-paced-training/part-1_federated_learning_introduction/chapter-2_develop_federated_learning_applications/02.3_convert_machine_learning_to_federated_learning/02.3.2_convert_kmeans_to_federated_learning/code/utils/prepare_data.py
@@ -0,0 +1,84 @@
+# Copyright (c) 2023, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import argparse
+import os
+from typing import Optional
+
+import numpy as np
+import pandas as pd
+from sklearn import datasets
+
+
+def load_data(dataset_name: str = "iris"):
+    if dataset_name == "iris":
+        dataset = datasets.load_iris()
+    elif dataset_name == "cancer":
+        dataset = datasets.load_breast_cancer()
+    else:
+        raise ValueError("Dataset unknown!")
+    return dataset
+
+
+def prepare_data(
+    output_dir: str,
+    dataset_name: str = "iris",
+    randomize: bool = False,
+    filename: Optional[str] = None,
+    file_format="csv",
+):
+    # Load data
+    dataset = load_data(dataset_name)
+    x = dataset.data
+    y = dataset.target
+    if randomize:
+        np.random.seed(0)
+        idx_random = np.random.permutation(len(y))
+        x = x[idx_random, :]
+        y = y[idx_random]
+
+    data = np.column_stack((y, x))
+    df = pd.DataFrame(data=data)
+
+    # Check if the target folder exists,
+    # If not, create
+
+    if os.path.exists(output_dir) and not os.path.isdir(output_dir):
+        os.rmdir(output_dir)
+    os.makedirs(output_dir, exist_ok=True)
+
+    # Save to csv file
+    filename = filename if filename else f"{dataset_name}.csv"
+    if file_format == "csv":
+        file_path = os.path.join(output_dir, filename)
+
+        df.to_csv(file_path, sep=",", index=False, header=False)
+    else:
+        raise NotImplementedError
+
+
+def main():
+    parser = argparse.ArgumentParser(description="Load sklearn data and save to csv")
+    parser.add_argument("--dataset_name", type=str, choices=["iris", "cancer"], help="Dataset name")
+    parser.add_argument("--randomize", type=int, help="Whether to randomize data sequence")
+    parser.add_argument("--out_path", type=str, help="Path to output data file")
+    args = parser.parse_args()
+
+    output_dir = os.path.dirname(args.out_path)
+    filename = os.path.basename(args.out_path)
+    prepare_data(output_dir, args.dataset_name, args.randomize, filename)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/examples/tutorials/self-paced-training/part-1_federated_learning_introduction/chapter-2_develop_federated_learning_applications/02.3_convert_machine_learning_to_federated_learning/02.3.2_convert_kmeans_to_federated_learning/convert_kmeans_to_fl.ipynb b/examples/tutorials/self-paced-training/part-1_federated_learning_introduction/chapter-2_develop_federated_learning_applications/02.3_convert_machine_learning_to_federated_learning/02.3.2_convert_kmeans_to_federated_learning/convert_kmeans_to_fl.ipynb
index 930a53239b..f25d8d2af1 100644
--- a/examples/tutorials/self-paced-training/part-1_federated_learning_introduction/chapter-2_develop_federated_learning_applications/02.3_convert_machine_learning_to_federated_learning/02.3.2_convert_kmeans_to_federated_learning/convert_kmeans_to_fl.ipynb
+++ b/examples/tutorials/self-paced-training/part-1_federated_learning_introduction/chapter-2_develop_federated_learning_applications/02.3_convert_machine_learning_to_federated_learning/02.3.2_convert_kmeans_to_federated_learning/convert_kmeans_to_fl.ipynb
@@ -1,19 +1,162 @@
 {
  "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "7d7767c9",
+   "metadata": {},
+   "source": [
+    "# Federated K-Means Clustering with Scikit-learn on Iris Dataset"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f635ea04",
+   "metadata": {},
+   "source": [
+    "## Introduction to Scikit-learn, tabular data, and federated k-Means\n",
+    "### Scikit-learn\n",
+    "This example shows how to use [NVIDIA FLARE](https://nvflare.readthedocs.io/en/main/index.html) on tabular data.\n",
+    "It uses [Scikit-learn](https://scikit-learn.org/),\n",
+    "a widely used open-source machine learning library that supports supervised \n",
+    "and unsupervised learning.\n",
+    "### Tabular data\n",
+    "The data used in this example is tabular in a format that can be handled by [pandas](https://pandas.pydata.org/), such that:\n",
+    "- rows correspond to data samples\n",
+    "- the first column represents the label \n",
+    "- the other columns cover the features.    \n",
+    "\n",
+    "Each client is expected to have one local data file containing both training \n",
+    "and validation samples. To load the data for each client, the following \n",
+    "parameters are expected by the local learner:\n",
+    "- data_file_path: string, the full path to the client's data file \n",
+    "- train_start: int, start row index for the training set\n",
+    "- train_end: int, end row index for the training set\n",
+    "- valid_start: int, start row index for the validation set\n",
+    "- valid_end: int, end row index for the validation set\n",
+    "\n",
+    "### Federated k-Means clustering\n",
+    "The machine learning algorithm in this example is [k-Means clustering](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html).\n",
+    "\n",
+    "The aggregation follows the scheme defined in [Mini-batch k-Means](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.MiniBatchKMeans.html). \n",
+    "\n",
+    "Under this setting, each round of federated learning can be formulated as follows:\n",
+    "- local training: starting from global centers, each client trains a local MiniBatchKMeans model with their own data\n",
+    "- global aggregation: server collects the cluster center, \n",
+    "  counts information from all clients, aggregates them by considering \n",
+    "  each client's results as a mini-batch, and updates the global center and per-center counts.\n",
+    "\n",
+    "For center initialization, at the first round, each client generates its initial centers with the k-means++ method. Then, the server collects all initial centers and performs one round of k-means to generate the initial global center.\n",
+    "\n",
+    "Below we listed steps to run this example."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ce92018e",
+   "metadata": {},
+   "source": [
+    "## Install requirements\n",
+    "First, install the required packages:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e08b25db",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "% pip install -r code/requirements.txt"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "31c22f7d",
+   "metadata": {},
+   "source": [
+    "## Download and prepare data\n",
+    "This example uses the Iris dataset available from Scikit-learn's dataset API.  "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e6c3b765",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%env DATASET_PATH=/tmp/nvflare/dataset/sklearn_iris.csv\n",
+    "! python3 ./code/utils/prepare_data.py --dataset_name iris --out_path ${DATASET_PATH}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6a1fefd8",
+   "metadata": {},
+   "source": [
+    "This will load the data, format it properly by removing the header, order \n",
+    "the label and feature columns, randomize the dataset, and save it to a CSV file with comma separation. \n",
+    "The default path is `/tmp/nvflare/dataset/sklearn_iris.csv`. \n",
+    "\n",
+    "Note that the dataset contains a label for each sample, which will not be \n",
+    "used for training since k-Means clustering is an unsupervised method. \n",
+    "The entire dataset with labels will be used for performance evaluation \n",
+    "based on [homogeneity_score](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.homogeneity_score.html)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cf161c43",
+   "metadata": {},
+   "source": [
+    "## Run simulated kmeans experiment\n",
+    "We can run the federated training using the NVFlare Simulator with the JobAPI:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a2a8f0ee",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "! python kmeans_job.py --num_clients 3 --split_mode uniform"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7b9fdb72",
+   "metadata": {},
+   "source": [
+    "With the default arguments, [kmeans_job.py](code/kmeans_job.py) will export the job to `/tmp/nvflare/workspace/jobs/kmeans` and then the job will be run with a workspace directory of `/tmp/nvflare/workspace/works/kmeans`."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fb48af70",
+   "metadata": {},
+   "source": [
+    "## Result visualization\n",
+    "Model accuracy is computed as the homogeneity score between the cluster formed and the ground truth label, which can be visualized in tensorboard."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "94a3e985-bc57-4973-b43f-867cf94ced6c",
+   "id": "88d9f366",
    "metadata": {},
    "outputs": [],
-   "source": []
+   "source": [
+    "%load_ext tensorboard\n",
+    "%tensorboard --logdir /tmp/nvflare/workspace/works/kmeans/sklearn_kmeans_uniform_3_clients"
+   ]
   }
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "nvflare_example",
+   "display_name": ".venv",
    "language": "python",
-   "name": "nvflare_example"
+   "name": "python3"
   },
   "language_info": {
    "codemirror_mode": {
diff --git a/examples/tutorials/self-paced-training/part-1_federated_learning_introduction/chapter-2_develop_federated_learning_applications/02.3_convert_machine_learning_to_federated_learning/convert_ml_to_fl.ipynb b/examples/tutorials/self-paced-training/part-1_federated_learning_introduction/chapter-2_develop_federated_learning_applications/02.3_convert_machine_learning_to_federated_learning/convert_ml_to_fl.ipynb
new file mode 100644
index 0000000000..efef1032b9
--- /dev/null
+++ b/examples/tutorials/self-paced-training/part-1_federated_learning_introduction/chapter-2_develop_federated_learning_applications/02.3_convert_machine_learning_to_federated_learning/convert_ml_to_fl.ipynb
@@ -0,0 +1,34 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Simple ML/DL to FL transition with NVFlare\n",
+    "\n",
+    "Converting Deep Learning (DL) models to Federated Learning (FL) entails several key steps:\n",
+    "\n",
+    " - Formulating the algorithm: This involves determining how to adapt a DL model into an FL framework, including specifying the information exchange protocol between the server and clients.\n",
+    "\n",
+    " - Code conversion: Adapting existing standalone DL code into FL-compatible code. This typically involves minimal changes, often just a few lines of code, thanks to tools like NVFlare.\n",
+    "\n",
+    " - Workflow configuration: Once the code is modified, configuring the workflow to integrate the newly adapted FL code seamlessly.\n",
+    "\n",
+    "NVFlare simplifies the process of transitioning from traditional Machine Learning (ML) or DL algorithms to FL. With NVFlare, the conversion process requires only minor code adjustments.\n",
+    "\n",
+    "In this section, we have the following three examples for converting traditional ML to FL:\n",
+    "\n",
+    "   * [Convert Logistics Regression to federated learning](02.3.1_convert_logistic_regression_to_federated_learning/convert_logistic_regression_to_fl.ipynb)\n",
+    "   * [Convert KMeans to federated learning](02.3.2_convert_kmeans_to_federated_learning/convert_kmeans_to_fl.ipynb)\n",
+    "   * [Convert Survival Analysis to federated learning](02.3.3_convert_survival_analysis_to_federated_learning/convert_survival_analysis_to_fl.ipynb)"
+   ]
+  }
+ ],
+ "metadata": {
+  "language_info": {
+   "name": "python"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/examples/tutorials/self-paced-training/part-1_federated_learning_introduction/chapter-2_develop_federated_learning_applications/02.4_client_api/Client_api.ipynb b/examples/tutorials/self-paced-training/part-1_federated_learning_introduction/chapter-2_develop_federated_learning_applications/02.4_client_api/Client_api.ipynb
deleted file mode 100644
index e69de29bb2..0000000000
diff --git a/examples/tutorials/self-paced-training/part-1_federated_learning_introduction/chapter-2_develop_federated_learning_applications/02.4_client_api/client_api.ipynb b/examples/tutorials/self-paced-training/part-1_federated_learning_introduction/chapter-2_develop_federated_learning_applications/02.4_client_api/client_api.ipynb
new file mode 100644
index 0000000000..3a0c46da26
--- /dev/null
+++ b/examples/tutorials/self-paced-training/part-1_federated_learning_introduction/chapter-2_develop_federated_learning_applications/02.4_client_api/client_api.ipynb
@@ -0,0 +1,346 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "58149c32",
+   "metadata": {},
+   "source": [
+    "# Transform Existing Code to FL Easily with the FLARE Client API"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "06203527",
+   "metadata": {},
+   "source": [
+    "The FLARE Client API provides an easy way to convert centralized, local training code into federated learning code with just a few lines of code changes.\n",
+    "\n",
+    "Most of the previous examples up this point have already been using the Client API, but in this section we focus on the core concepts of the Client API and explain some of the ways it can be configured to help you use the Client API more effectively.\n",
+    "\n",
+    "You can see the detailed examples with actual integration with deep learing platforms including PyTorch and TensorFlow here: https://github.com/NVIDIA/NVFlare/tree/main/examples/hello-world/ml-to-fl"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "be7efa36",
+   "metadata": {},
+   "source": [
+    "## Core Concept"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "76102eac",
+   "metadata": {},
+   "source": [
+    "The general structure of the popular federated learning (FL) workflow, \"FedAvg\" is as follows:\n",
+    "\n",
+    "1. **FL server initializes an initial model**\n",
+    "2. **For each round (global iteration):**\n",
+    "    1. FL server sends the global model to clients\n",
+    "    2. Each FL client starts with this global model and trains on their own data\n",
+    "    3. Each FL client sends back their trained model\n",
+    "    4. FL server aggregates all the models and produces a new global model\n",
+    "\n",
+    "On the client side, the training workflow is as follows:\n",
+    "\n",
+    "1. Receive the model from the FL server\n",
+    "2. Perform local training on the received global model and/or evaluate the received global model for model selection\n",
+    "3. Send the new model back to the FL server"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "50e2b7dd",
+   "metadata": {},
+   "source": [
+    "To convert a centralized training code to federated learning, we need to\n",
+    "adapt the code to do the following steps:\n",
+    "\n",
+    "1. Obtain the required information from the received `fl_model`\n",
+    "2. Run local training\n",
+    "3. Put the results in a new `fl_model` to be sent back\n",
+    "\n",
+    "For a general use case, there are three essential methods for the Client API:\n",
+    "\n",
+    "* ``init()``: Initializes NVFlare Client API environment.\n",
+    "* ``receive()``: Receives model from NVFlare side.\n",
+    "* ``send()``: Sends the model to NVFlare side.\n",
+    "\n",
+    "You can use the Client API to change centralized training code to\n",
+    "federated learning, for example:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9f21ee16",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import nvflare.client as flare\n",
+    "\n",
+    "flare.init() # 1. Initializes NVFlare Client API environment.\n",
+    "input_model = flare.receive() # 2. Receives model from NVFlare side.\n",
+    "params = input_model.params # 3. Obtain the required information from received FLModel\n",
+    "\n",
+    "# original local training code begins\n",
+    "new_params = local_train(params)\n",
+    "# original local training code ends\n",
+    "\n",
+    "output_model = flare.FLModel(params=new_params) # 4. Put the results in a new FLModel\n",
+    "flare.send(output_model) # 5. Sends the model to NVFlare side."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "494e4079",
+   "metadata": {},
+   "source": [
+    "With 5 lines of code changes, we convert the centralized training code to work in a\n",
+    "federated learning setting.\n",
+    "\n",
+    "After this, we can use the job templates and the Job CLI\n",
+    "to generate a job and export it to run on a deployed NVFlare system or directly run the job using FL Simulator.\n",
+    "\n",
+    "To see a table of the key Client APIs, see the [Client API documentation in the programming guide](https://nvflare.readthedocs.io/en/main/programming_guide/execution_api_type/client_api.html#id2).\n",
+    "\n",
+    "Please consult the [Client API Module](https://nvflare.readthedocs.io/en/main/apidocs/nvflare.client.api.html) for more in-depth information about all of the Client API functions.\n",
+    "\n",
+    "If you are using PyTorch Lightning in your training code, you can check the [Lightning API Module](https://nvflare.readthedocs.io/en/main/apidocs/nvflare.app_opt.lightning.api.html). Also, be sure to look through the [Convert Torch Lightning to FL notebook](../02.2_convert_torch_lightning_to_federated_learning/convert_torch_lightning_to_fl.ipynb) and related code."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4a09d80e",
+   "metadata": {},
+   "source": [
+    "## Advanced User Options: Client API with Different Implementations\n",
+    "\n",
+    "Within the Client API, we offer multiple implementations tailored to diverse requirements:\n",
+    "\n",
+    "* In-process Client API: In this setup, the client training script operates within the same process as the NVFlare Client job.\n",
+    "This configuration, utilizing the ```InProcessClientAPIExecutor```, offers shared memory usage and is efficient with simple configuration. \n",
+    "This is the default for `ScriptRunner` since by default `launch_external_process=False`. Use this configuration for development or single GPU training.\n",
+    "\n",
+    "* Sub-process Client API: Here, the client training script runs in a separate subprocess.\n",
+    "Utilizing the ```ClientAPILauncherExecutor```, this option offers flexibility in communication mechanisms:\n",
+    "  * Communication via CellPipe (default)\n",
+    "  * Communication via FilePipe (no capability to stream metrics for experiment tracking) \n",
+    "This configuration is ideal for scenarios requiring multi-GPU or distributed PyTorch training.\n",
+    "\n",
+    "Choose the option best suited to your specific requirements and workflow preferences.\n",
+    "\n",
+    "These implementations can be easily configured using the JobAPI's `ScriptRunner`.\n",
+    "By default, the ```InProcessClientAPIExecutor``` is used, however setting `launch_external_process=True` uses the ```ClientAPILauncherExecutor```\n",
+    "with pre-configured CellPipes for communication and metrics streaming."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b3ac92dd",
+   "metadata": {},
+   "source": [
+    "## NVFlare Client API Job with NumPy"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "832a4b34",
+   "metadata": {},
+   "source": [
+    "In this example we use simple NumPy scripts to showcase the Client API with the `ScriptRunner` for both in-process and sub-process settings. With NumPy, only nvflare is needed so you do not have to install any additional dependencies.\n",
+    "\n",
+    "The default mode of the `ScriptRunner` uses `InProcessClientAPIExecutor` with the client training script operating within the same process as the NVFlare Client job. Below, we show a script that sends back full model parameters and then one that sends back model parameters differences before explaining metrics streaming and then showing how to launch those same scripts with the Sub-process Client API."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c1af7da6",
+   "metadata": {},
+   "source": [
+    "### Send model parameters back to the NVFlare server\n",
+    "\n",
+    "We use the mock training script in [train_full.py](code/src/train_full.py)\n",
+    "and send back the FLModel with `params_type=\"FULL\"`.\n",
+    "\n",
+    "After we modify our training script, we can create a job using the ScriptRunner: [np_client_api_job.py](code/np_client_api_job.py).\n",
+    "\n",
+    "The script will run the job using the simulator with the Job API by default:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7952cab8",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "! python3 code/np_client_api_job.py --script code/src/train_full.py"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5076b478",
+   "metadata": {},
+   "source": [
+    "To instead export the job configuration to use in other modes, run the script with the flag `--export_config`."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "efdceecd",
+   "metadata": {},
+   "source": [
+    "### Send model parameters differences back to the NVFlare server\n",
+    "\n",
+    "We can send model parameter differences back to the NVFlare server by calculating the parameters differences and sending it back: [train_diff.py](code/src/train_diff.py)\n",
+    "\n",
+    "Note that we set the `params_type` to `DIFF` when creating `flare.FLModel`.\n",
+    "\n",
+    "Then we can run it using the NVFlare Simulator:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "737d8b7c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "! python3 code/np_client_api_job.py --script code/src/train_diff.py"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "550479f2",
+   "metadata": {},
+   "source": [
+    "### Metrics streaming\n",
+    "\n",
+    "We already showed an example with metrics streaming in section 01.5 of Chapter 1 in Part 1, but this is a simple example with the Client API for streaming the training progress to the server with `MLflowWriter`.\n",
+    "\n",
+    "NVFlare supports the following writers:\n",
+    "\n",
+    "  - `SummaryWriter` mimics Tensorboard `SummaryWriter`'s `add_scalar`, `add_scalars` method\n",
+    "  - `WandBWriter` mimics Weights And Biases's `log` method\n",
+    "  - `MLflowWriter` mimics MLflow's tracking api\n",
+    "\n",
+    "In this example we use `MLflowWriter` in [train_metrics.py](code/src/train_metrics.py) and configure a corresponding `MLflowReceiver` in the job script [np_client_api_job.py](code/np_client_api_job.py)\n",
+    "\n",
+    "Then we can run it using the NVFlare Simulator:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d27f9b3a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "! python3 code/np_client_api_job.py --script code/src/train_metrics.py"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3774d943",
+   "metadata": {},
+   "source": [
+    "After the experiment is finished, you can view the results by running the the mlflow command: `mlflow ui --port 5000` inside the directory `/tmp/nvflare/jobs/workdir/server/simulate_job/`."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "eadee3dd",
+   "metadata": {},
+   "source": [
+    "## Sub-process Client API\n",
+    "\n",
+    "The `ScriptRunner` with `launch_external_process=True` uses the `ClientAPILauncherExecutor` for external process script execution.\n",
+    "This configuration is ideal for scenarios requiring third-party integrations, multi-GPU or distributed PyTorch training, or if additional processes are needed for training."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ade375cd",
+   "metadata": {},
+   "source": [
+    "### Launching the script\n",
+    "\n",
+    "When launching a script in an external process, it is launched once for the entire job.\n",
+    "We must ensure our training script [train_full.py](code/src/train_full.py) is in a loop to support this.\n",
+    "\n",
+    "Then we can run it using the NVFlare Simulator:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "0b3ee641",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "! python3 code/np_client_api_job.py --script code/src/train_full.py --launch_process"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5006b87a",
+   "metadata": {},
+   "source": [
+    "### Metrics streaming\n",
+    "\n",
+    "In this example we use `MLflowWriter` in [train_metrics.py](code/src/train_metrics.py) and configure a corresponding `MLflowReceiver` in the job script [np_client_api_job.py](code/np_client_api_job.py)\n",
+    "\n",
+    "Then we can run it using the NVFlare Simulator:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "071834d0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "! python3 np_client_api_job.py --script src/train_metrics.py --launch_process"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "95504253",
+   "metadata": {},
+   "source": [
+    "If you want to see example code with actual integration with PyTorch and TensorFlow, you can find it in the [Hello World ML to FL](https://github.com/NVIDIA/NVFlare/tree/main/examples/hello-world/ml-to-fl) section of the examples."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c56e633e",
+   "metadata": {},
+   "source": [
+    "With this, we are at the end of Chapter 2. The [next notebook](../02.5_recap/recap.ipynb) is a reacap of this chapter."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.12"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/examples/tutorials/self-paced-training/part-1_federated_learning_introduction/chapter-2_develop_federated_learning_applications/02.4_client_api/code/np_client_api_job.py b/examples/tutorials/self-paced-training/part-1_federated_learning_introduction/chapter-2_develop_federated_learning_applications/02.4_client_api/code/np_client_api_job.py
new file mode 100644
index 0000000000..b2a09c5968
--- /dev/null
+++ b/examples/tutorials/self-paced-training/part-1_federated_learning_introduction/chapter-2_develop_federated_learning_applications/02.4_client_api/code/np_client_api_job.py
@@ -0,0 +1,82 @@
+# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import argparse
+
+from nvflare import FedJob
+from nvflare.app_common.np.np_model_persistor import NPModelPersistor
+from nvflare.app_common.workflows.fedavg import FedAvg
+from nvflare.app_opt.tracking.mlflow.mlflow_receiver import MLflowReceiver
+from nvflare.job_config.script_runner import FrameworkType, ScriptRunner
+
+
+def define_parser():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--n_clients", type=int, default=2)
+    parser.add_argument("--num_rounds", type=int, default=5)
+    parser.add_argument("--script", type=str, default="src/train_full.py")
+    parser.add_argument("--launch_process", action=argparse.BooleanOptionalAction, default=False)
+    parser.add_argument("--export_config", action=argparse.BooleanOptionalAction, default=False)
+
+    return parser.parse_args()
+
+
+def main():
+    # define local parameters
+    args = define_parser()
+
+    n_clients = args.n_clients
+    num_rounds = args.num_rounds
+    script = args.script
+    launch_process = args.launch_process
+    export_config = args.export_config
+
+    job = FedJob(name="np_client_api")
+
+    persistor_id = job.to_server(NPModelPersistor(), "persistor")
+
+    # Define the controller workflow and send to server
+    controller = FedAvg(num_clients=n_clients, num_rounds=num_rounds, persistor_id=persistor_id)
+    job.to_server(controller)
+
+    # Add MLflow Receiver for metrics streaming
+    if script == "src/train_metrics.py":
+        receiver = MLflowReceiver(
+            tracking_uri="file:///tmp/nvflare/jobs/workdir/server/simulate_job/mlruns",
+            kw_args={
+                "experiment_name": "nvflare-fedavg-np-experiment",
+                "run_name": "nvflare-fedavg-np-with-mlflow",
+                "experiment_tags": {"mlflow.note.content": "## **NVFlare FedAvg Numpy experiment with MLflow**"},
+                "run_tags": {"mlflow.note.content": "## Federated Experiment tracking with MLflow.\n"},
+            },
+            artifact_location="artifacts",
+            events=["fed.analytix_log_stats"],
+        )
+        job.to_server(receiver)
+
+    executor = ScriptRunner(
+        script=script,
+        launch_external_process=launch_process,
+        framework=FrameworkType.NUMPY,
+    )
+    job.to_clients(executor)
+
+    if export_config:
+        job.export_job("/tmp/nvflare/jobs/job_config")
+    else:
+        job.simulator_run("/tmp/nvflare/jobs/workdir", n_clients=n_clients, gpu="0")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/examples/tutorials/self-paced-training/part-1_federated_learning_introduction/chapter-2_develop_federated_learning_applications/02.4_client_api/code/src/train_diff.py b/examples/tutorials/self-paced-training/part-1_federated_learning_introduction/chapter-2_develop_federated_learning_applications/02.4_client_api/code/src/train_diff.py
new file mode 100755
index 0000000000..d7dc02ee45
--- /dev/null
+++ b/examples/tutorials/self-paced-training/part-1_federated_learning_introduction/chapter-2_develop_federated_learning_applications/02.4_client_api/code/src/train_diff.py
@@ -0,0 +1,71 @@
+# Copyright (c) 2023, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import copy
+
+import nvflare.client as flare
+
+
+def train(input_arr):
+    output_arr = copy.deepcopy(input_arr)
+    # mock training with plus 1
+    return output_arr + 1
+
+
+def evaluate(input_arr):
+    # mock evaluation metrics
+    return 100
+
+
+def main():
+    # initializes NVFlare interface
+    flare.init()
+
+    # get system information
+    sys_info = flare.system_info()
+    print(f"system info is: {sys_info}")
+
+    while flare.is_running():
+
+        # get model from NVFlare
+        input_model = flare.receive()
+        print(f"received weights is: {input_model.params}")
+
+        input_numpy_array = input_model.params["numpy_key"]
+
+        # training
+        output_numpy_array = train(input_numpy_array)
+
+        # evaluation
+        metrics = evaluate(input_numpy_array)
+
+        print(f"finish round: {input_model.current_round}")
+
+        # calculate difference here
+        diff = output_numpy_array - input_numpy_array
+
+        # send back the model difference
+        print(f"send back: {diff}")
+        flare.send(
+            flare.FLModel(
+                params={"numpy_key": diff},
+                params_type="DIFF",
+                metrics={"accuracy": metrics},
+                current_round=input_model.current_round,
+            )
+        )
+
+
+if __name__ == "__main__":
+    main()
diff --git a/examples/tutorials/self-paced-training/part-1_federated_learning_introduction/chapter-2_develop_federated_learning_applications/02.4_client_api/code/src/train_full.py b/examples/tutorials/self-paced-training/part-1_federated_learning_introduction/chapter-2_develop_federated_learning_applications/02.4_client_api/code/src/train_full.py
new file mode 100755
index 0000000000..e1598275b5
--- /dev/null
+++ b/examples/tutorials/self-paced-training/part-1_federated_learning_introduction/chapter-2_develop_federated_learning_applications/02.4_client_api/code/src/train_full.py
@@ -0,0 +1,68 @@
+# Copyright (c) 2023, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import copy
+
+import nvflare.client as flare
+
+
+def train(input_arr):
+    output_arr = copy.deepcopy(input_arr)
+    # mock training with plus 1
+    return output_arr + 1
+
+
+def evaluate(input_arr):
+    # mock evaluation metrics
+    return 100
+
+
+def main():
+    # initializes NVFlare interface
+    flare.init()
+
+    # get system information
+    sys_info = flare.system_info()
+    print(f"system info is: {sys_info}")
+
+    while flare.is_running():
+
+        # get model from NVFlare
+        input_model = flare.receive()
+        print(f"received weights is: {input_model.params}", flush=True)
+
+        input_numpy_array = input_model.params["numpy_key"]
+
+        # training
+        output_numpy_array = train(input_numpy_array)
+
+        # evaluation
+        metrics = evaluate(input_numpy_array)
+
+        print(f"finish round: {input_model.current_round}", flush=True)
+
+        # send back the model
+        print(f"send back: {output_numpy_array}", flush=True)
+        flare.send(
+            flare.FLModel(
+                params={"numpy_key": output_numpy_array},
+                params_type="FULL",
+                metrics={"accuracy": metrics},
+                current_round=input_model.current_round,
+            )
+        )
+
+
+if __name__ == "__main__":
+    main()
diff --git a/examples/tutorials/self-paced-training/part-1_federated_learning_introduction/chapter-2_develop_federated_learning_applications/02.4_client_api/code/src/train_metrics.py b/examples/tutorials/self-paced-training/part-1_federated_learning_introduction/chapter-2_develop_federated_learning_applications/02.4_client_api/code/src/train_metrics.py
new file mode 100755
index 0000000000..c6b24e877d
--- /dev/null
+++ b/examples/tutorials/self-paced-training/part-1_federated_learning_introduction/chapter-2_develop_federated_learning_applications/02.4_client_api/code/src/train_metrics.py
@@ -0,0 +1,84 @@
+# Copyright (c) 2023, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import copy
+import time
+
+import nvflare.client as flare
+from nvflare.client.tracking import MLflowWriter
+
+
+def train(input_arr, current_round, epochs=3):
+    writer = MLflowWriter()
+    output_arr = copy.deepcopy(input_arr)
+    num_of_data = 2000
+    batch_size = 16
+    num_of_batches = num_of_data // batch_size
+    for i in range(epochs):
+        for j in range(num_of_batches):
+            global_step = current_round * num_of_batches * epochs + i * num_of_batches + j
+            writer.log_metric(
+                key="global_step",
+                value=global_step,
+                step=global_step,
+            )
+        print(f"logged records from epoch: {i}")
+        # mock training with plus 1
+        output_arr += 1
+        # assume each epoch takes 1 seconds
+        time.sleep(1.0)
+    return output_arr
+
+
+def evaluate(input_arr):
+    # mock evaluation metrics
+    return 100
+
+
+def main():
+    # initializes NVFlare interface
+    flare.init()
+
+    # get system information
+    sys_info = flare.system_info()
+    print(f"system info is: {sys_info}")
+
+    while flare.is_running():
+        input_model = flare.receive()
+        print(f"received weights is: {input_model.params}")
+
+        input_numpy_array = input_model.params["numpy_key"]
+
+        # training
+        output_numpy_array = train(input_numpy_array, current_round=input_model.current_round, epochs=3)
+
+        # evaluation
+        metrics = evaluate(input_numpy_array)
+
+        print(f"finish round: {input_model.current_round}")
+
+        # send back the model
+        print(f"send back: {output_numpy_array}")
+        flare.send(
+            flare.FLModel(
+                params={"numpy_key": output_numpy_array},
+                params_type="FULL",
+                metrics={"accuracy": metrics},
+                current_round=input_model.current_round,
+            )
+        )
+
+
+if __name__ == "__main__":
+    main()
diff --git a/examples/tutorials/self-paced-training/part-1_federated_learning_introduction/chapter-2_develop_federated_learning_applications/02.5_recap/recap.ipynb b/examples/tutorials/self-paced-training/part-1_federated_learning_introduction/chapter-2_develop_federated_learning_applications/02.5_recap/recap.ipynb
index 6914bd22db..77f82177a4 100644
--- a/examples/tutorials/self-paced-training/part-1_federated_learning_introduction/chapter-2_develop_federated_learning_applications/02.5_recap/recap.ipynb
+++ b/examples/tutorials/self-paced-training/part-1_federated_learning_introduction/chapter-2_develop_federated_learning_applications/02.5_recap/recap.ipynb
@@ -1,19 +1,51 @@
 {
  "cells": [
   {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "b61e9f9c-e1fd-4091-a2c2-a43a780f01b0",
+   "cell_type": "markdown",
+   "id": "7b152728-3366-4432-adb1-29aa3051dc22",
    "metadata": {},
-   "outputs": [],
-   "source": []
+   "source": [
+    "# Summary of Chapter 2\n",
+    "\n",
+    "We covered developing federated learning applications in Chapter 2. Here is an overview:\n",
+    "\n",
+    "1. **Federated Statistics**\n",
+    "    - **Federated Statistics with Image Data**: How to compute local and global image statistics with the consideration that data is private at each of the client sites.\n",
+    "        - [federated_statistics_with_image_data.ipynb](../02.1_federated_statistics/federated_statistics_with_image_data/federated_statistics_with_image_data.ipynb)\n",
+    "    - **Federated Statistics with Tabular Data**: How to create federated statistics for data that can be represented as Pandas DataFrames.\n",
+    "        - [federated_statistics_with_tabular_data.ipynb](../02.1_federated_statistics/federated_statistics_with_tabular_data/federated_statistics_with_tabular_data.ipynb)\n",
+    "\n",
+    "2. **Converting PyTorch Lightning to FL**\n",
+    "    - **PyTorch Lightning to FL**: Guide on converting PyTorch Lightning scripts to federated learning.\n",
+    "        - [convert_torch_lightning_to_fl.ipynb](../02.2_convert_torch_lightning_to_federated_learning/convert_torch_lightning_to_fl.ipynb)\n",
+    "\n",
+    "3. **Simple ML/DL to FL transition with NVFlare**\n",
+    "    - **Converting Logistic Regression to FL**: How to implement a federated binary classification via logistic regression with second-order Newton-Raphson optimization. \n",
+    "        - [convert_logistic_regression_to_fl.ipynb](../02.3_convert_machine_learning_to_federated_learning/02.3.1_convert_logistic_regression_to_federated_learning/convert_logistic_regression_to_fl.ipynb)\n",
+    "    - **Converting KMeans to FL**: ADD CONTENT HERE. \n",
+    "        - [convert_kmeans_to_fl.ipynb](../02.3_convert_machine_learning_to_federated_learning/02.3.2_convert_kmeans_to_federated_learning/convert_kmeans_to_fl.ipynb)\n",
+    "    - **Secure Federated Kaplan-Meier Analysis via Time-Binning and Homomorphic Encryption**: ADD CONTENT HERE. \n",
+    "        - [convert_survival_analysis_to_fl.ipynb](../02.3_convert_machine_learning_to_federated_learning/02.3.3_convert_survival_analysis_to_federated_learning/convert_survival_analysis_to_fl.ipynb)\n",
+    "\n",
+    "4. **Client API**\n",
+    "    - **Client API**: Here we focus on the core concepts of the Client API and explain how to configure it to run within the same process or in a separate subprocess. \n",
+    "        - [client_api.ipynb](../02.4_client_api/client_api.ipynb)\n",
+    "\n",
+    "5. **Recap of Covered Topics**\n",
+    "    - **Summary and Recap**: A recap of the topics covered in the previous sections.\n",
+    "\n",
+    "Each section is designed to provide comprehensive guidance and practical examples to help you implement and customize federated learning in your applications. For detailed instructions and examples, refer to the respective notebooks linked in each section.\n",
+    "\n",
+    "\n",
+    "Now let's move on to the [Chapter 3](../../../part-2_federated_learning_system/chapter-3_federated_computing_platform/03.0_introduction/introduction.ipynb)."
+   ]
   }
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "nvflare_example",
+   "display_name": "Python 3",
    "language": "python",
-   "name": "nvflare_example"
+   "name": "python3"
   },
   "language_info": {
    "codemirror_mode": {
@@ -25,7 +57,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.10.2"
+   "version": "3.10.12"
   }
  },
  "nbformat": 4,
diff --git a/nvflare/app_opt/flower/applet.py b/nvflare/app_opt/flower/applet.py
index 50836ce36f..3bd0958fa8 100644
--- a/nvflare/app_opt/flower/applet.py
+++ b/nvflare/app_opt/flower/applet.py
@@ -238,7 +238,7 @@ def _run_flower_command(self, command: str):
 
         success = result.get("success", False)
         if not success:
-            err = f"failed command '{command}': {success=}"
+            err = f"failed command '{command}': {success=} {result=}"
             self.logger.error(err)
             raise RuntimeError(err)
 
diff --git a/nvflare/app_opt/lightning/api.py b/nvflare/app_opt/lightning/api.py
index 4e674e5915..45629a6b42 100644
--- a/nvflare/app_opt/lightning/api.py
+++ b/nvflare/app_opt/lightning/api.py
@@ -12,6 +12,7 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
+import logging
 from typing import Dict
 
 import pytorch_lightning as pl
@@ -29,7 +30,9 @@
 FL_META_KEY = "__fl_meta__"
 
 
-def patch(trainer: pl.Trainer, restore_state: bool = True, load_state_dict_strict: bool = True):
+def patch(
+    trainer: pl.Trainer, restore_state: bool = True, load_state_dict_strict: bool = True, update_fit_loop: bool = True
+):
     """Patches the PyTorch Lightning Trainer for usage with NVFlare.
 
     Args:
@@ -39,6 +42,8 @@ def patch(trainer: pl.Trainer, restore_state: bool = True, load_state_dict_stric
         load_state_dict_strict: exposes `strict` argument of `torch.nn.Module.load_state_dict()`
             used to load the received model. Defaults to `True`.
             See https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.load_state_dict for details.
+        update_fit_loop: whether to increase `trainer.fit_loop.max_epochs` and `trainer.fit_loop.epoch_loop.max_steps` each FL round.
+            Defaults to `True` which is suitable for most PyTorch Lightning applications.
 
     Example:
 
@@ -75,7 +80,9 @@ def __init__(self):
         callbacks = []
 
     if not any(isinstance(cb, FLCallback) for cb in callbacks):
-        fl_callback = FLCallback(rank=trainer.global_rank, load_state_dict_strict=load_state_dict_strict)
+        fl_callback = FLCallback(
+            rank=trainer.global_rank, load_state_dict_strict=load_state_dict_strict, update_fit_loop=update_fit_loop
+        )
         callbacks.append(fl_callback)
 
     if restore_state and not any(isinstance(cb, RestoreState) for cb in callbacks):
@@ -85,7 +92,7 @@ def __init__(self):
 
 
 class FLCallback(Callback):
-    def __init__(self, rank: int = 0, load_state_dict_strict: bool = True):
+    def __init__(self, rank: int = 0, load_state_dict_strict: bool = True, update_fit_loop: bool = True):
         """FL callback for lightning API.
 
         Args:
@@ -93,6 +100,8 @@ def __init__(self, rank: int = 0, load_state_dict_strict: bool = True):
             load_state_dict_strict: exposes `strict` argument of `torch.nn.Module.load_state_dict()`
                 used to load the received model. Defaults to `True`.
                 See https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.load_state_dict for details.
+            update_fit_loop: whether to increase `trainer.fit_loop.max_epochs` and `trainer.fit_loop.epoch_loop.max_steps` each FL round.
+                Defaults to `True` which is suitable for most PyTorch Lightning applications.
         """
         super(FLCallback, self).__init__()
         init(rank=str(rank))
@@ -108,6 +117,9 @@ def __init__(self, rank: int = 0, load_state_dict_strict: bool = True):
         self._is_evaluation = False
         self._is_submit_model = False
         self._load_state_dict_strict = load_state_dict_strict
+        self._update_fit_loop = update_fit_loop
+
+        self.logger = logging.getLogger(self.__class__.__name__)
 
     def reset_state(self, trainer):
         """Resets the state.
@@ -130,10 +142,12 @@ def reset_state(self, trainer):
 
             # for next round
             trainer.num_sanity_val_steps = 0  # Turn off sanity validation steps in following rounds of FL
-            if self.total_local_epochs and self.max_epochs_per_round is not None:
-                trainer.fit_loop.max_epochs = self.max_epochs_per_round + self.total_local_epochs
-            if self.total_local_steps and self.max_steps_per_round is not None:
-                trainer.fit_loop.epoch_loop.max_steps = self.max_steps_per_round + self.total_local_steps
+
+            if self._update_fit_loop:
+                if self.total_local_epochs and self.max_epochs_per_round is not None:
+                    trainer.fit_loop.max_epochs = self.max_epochs_per_round + self.total_local_epochs
+                if self.total_local_steps and self.max_steps_per_round is not None:
+                    trainer.fit_loop.epoch_loop.max_steps = self.max_steps_per_round + self.total_local_steps
 
         # resets attributes
         self.metrics = None
@@ -184,7 +198,15 @@ def _receive_and_update_model(self, trainer, pl_module):
         model = self._receive_model(trainer)
         if model:
             if model.params:
-                pl_module.load_state_dict(model.params, strict=self._load_state_dict_strict)
+                missing_keys, unexpected_keys = pl_module.load_state_dict(
+                    model.params, strict=self._load_state_dict_strict
+                )
+                if len(missing_keys) > 0:
+                    self.logger.warning(f"There were missing keys when loading the global state_dict: {missing_keys}")
+                if len(unexpected_keys) > 0:
+                    self.logger.warning(
+                        f"There were unexpected keys when loading the global state_dict: {unexpected_keys}"
+                    )
             if model.current_round is not None:
                 self.current_round = model.current_round
 
diff --git a/nvflare/app_opt/psi/dh_psi/dh_psi_task_handler.py b/nvflare/app_opt/psi/dh_psi/dh_psi_task_handler.py
index cc433954ea..5d84224534 100644
--- a/nvflare/app_opt/psi/dh_psi/dh_psi_task_handler.py
+++ b/nvflare/app_opt/psi/dh_psi/dh_psi_task_handler.py
@@ -49,6 +49,8 @@ def __init__(self, local_psi_id: str):
         self.local_psi_handler: Optional[PSI] = None
         self.client_name = None
         self.items = None
+        # needed by JobAPI, add the following line to the constructor
+        self.local_psi_id = local_psi_id
 
     def initialize(self, fl_ctx: FLContext):
         super().initialize(fl_ctx)