Merge branch 'main' into bionemo_examples

holgerroth · Feb 22, 2024 · 2dace43 · 2dace43
2 parents 427e679 + 3460970
commit 2dace43
Show file tree

Hide file tree

Showing 313 changed files with 8,272 additions and 3,220 deletions.
diff --git a/.github/workflows/markdown-links-check.yml b/.github/workflows/markdown-links-check.yml
@@ -17,10 +17,7 @@ name: Check Markdown links
 
 on:
   push:
-    branches: [ "main", "dev" ]
   pull_request:
-    # The branches below must be a subset of the branches above
-    branches: [ "main", "dev" ]
 
 jobs:
   markdown-link-check:

diff --git a/.github/workflows/premerge.yml b/.github/workflows/premerge.yml
@@ -17,8 +17,6 @@ name: pre-merge
 on:
   # quick tests for pull requests and the releasing branches
   push:
-    branches:
-      - dev
   pull_request:
   workflow_dispatch:
 

diff --git a/docs/_static/css/additions.css b/docs/_static/css/additions.css
@@ -1,3 +1,6 @@
 .wy-menu-vertical li.toctree-l4.current li.toctree-l5>a{display:block;background:#b1b1b1;padding:.4045em 7.3em}
 .wy-menu-vertical li.toctree-l5.current li.toctree-l6>a{display:block;background:#a9a9a9;padding:.4045em 8.8em}
-.wy-menu-vertical li.toctree-l5{font-size: .9em;}
+.wy-menu-vertical li.toctree-l5{font-size: .9em;}
+.wy-menu > .caption > span.caption-text {
+    color: #76b900;
+  }
diff --git a/docs/conf.py b/docs/conf.py
@@ -44,7 +44,7 @@ def resolve_xref(self, env, fromdocname, builder, typ, target, node, contnode):
 # -- Project information -----------------------------------------------------
 
 project = "NVIDIA FLARE"
-copyright = "2023, NVIDIA"
+copyright = "2024, NVIDIA"
 author = "NVIDIA"
 
 # The full version, including alpha/beta/rc tags
@@ -114,6 +114,7 @@ def resolve_xref(self, env, fromdocname, builder, typ, target, node, contnode):
 html_scaled_image_link = False
 html_show_sourcelink = True
 html_favicon = "favicon.ico"
+html_logo = "resources/nvidia_logo.png"
 
 # Add any paths that contain custom static files (such as style sheets) here,
 # relative to this directory. They are copied after the builtin static files,

diff --git a/docs/example_applications_algorithms.rst b/docs/example_applications_algorithms.rst
@@ -6,13 +6,7 @@ Example Applications
 NVIDIA FLARE has several tutorials and examples to help you get started with federated learning and to explore certain features in the
 :github_nvflare_link:`examples directory <examples>`.
 
-1. Step-By-Step Example Series
-==============================
-
-  * :github_nvflare_link:`Step-by-Step CIFAR-10 Examples (GitHub) <examples/hello-world/step-by-step/cifar10>` - Step-by-step examples series with CIFAR-10 (image data) to showcase to showcase different FLARE features, workflows, and APIs.
-  * :github_nvflare_link:`Step-by-Step HIGGS Examples (GitHub) <examples/hello-world/step-by-step/higgs>` - Step-by-step examples series with HIGGS (tabular data) to showcase to showcase different FLARE features, workflows, and APIs.
-
-2. Hello World Examples
+1. Hello World Examples
 =======================
 Can be run from the :github_nvflare_link:`hello_world notebook <examples/hello-world/hello_world.ipynb>`.
 
@@ -22,27 +16,58 @@ Can be run from the :github_nvflare_link:`hello_world notebook <examples/hello-w
 
   examples/hello_world_examples
 
-2.1. Deep Learning to Federated Learning
+1.1. Deep Learning to Federated Learning
 ----------------------------------------
 
   * :github_nvflare_link:`Deep Learning to Federated Learning (GitHub) <examples/hello-world/ml-to-fl>` - Example for converting Deep Learning (DL) to Federated Learning (FL) using the Client API.
 
-2.2. Workflows
+1.2. Workflows
 --------------
 
   * :ref:`Hello Scatter and Gather <hello_scatter_and_gather>` - Example using the Scatter And Gather (SAG) workflow with a Numpy trainer
-  * :ref:`Hello Cross-Site Validation <hello_cross_val>` - Example using the Cross Site Model Eval workflow with a Numpy trainer
+  * :ref:`Hello Cross-Site Validation <hello_cross_val>` - Example using the Cross Site Model Eval workflow with a Numpy trainer, also demonstrates running cross site validation using the previous training results.
   * :github_nvflare_link:`Hello Cyclic Weight Transfer (GitHub) <examples/hello-world/hello-cyclic>` - Example using the CyclicController workflow to implement `Cyclic Weight Transfer <https://pubmed.ncbi.nlm.nih.gov/29617797/>`_ with TensorFlow as the deep learning training framework
   * :github_nvflare_link:`Swarm Learning <examples/advanced/swarm_learning>` - Example using Swarm Learning and Client-Controlled Cross-site Evaluation workflows.
   * :github_nvflare_link:`Client-Controlled Cyclic Weight Transfer <examples/hello-world/step-by-step/cifar10/cyclic_ccwf>` - Example using Client-Controlled Cyclic workflow using Client API.
 
-2.3. Deep Learning
+1.3. Deep Learning
 ------------------
 
   * :ref:`Hello PyTorch <hello_pt>` - Example image classifier using FedAvg and PyTorch as the deep learning training framework
   * :ref:`Hello TensorFlow <hello_tf2>` - Example image classifier using FedAvg and TensorFlow as the deep learning training frameworks
 
 
+
+2. Step-By-Step Example Series
+==============================
+
+:github_nvflare_link:`Step-by-Step Examples (GitHub) <examples/hello-world/step-by-step/>` - Step-by-step examples series with CIFAR-10 (image data) and HIGGS (tabular data) to showcase different FLARE features, workflows, and APIs.
+
+2.1 CIFAR-10 Image Data Examples
+--------------------------------
+
+  * :github_nvflare_link:`image_stats <examples/hello-world/step-by-step/cifar10/stats/image_stats.ipynb>` - federated statistics (histograms) of CIFAR10.
+  * :github_nvflare_link:`sag <examples/hello-world/step-by-step/cifar10/sag/sag.ipynb>` - scatter and gather (SAG) workflow with PyTorch with Client API.
+  * :github_nvflare_link:`sag_deploy_map <examples/hello-world/step-by-step/cifar10/sag_deploy_map/sag_deploy_map.ipynb>` - scatter and gather workflow with deploy_map configuration for deployment of apps to different sites using the Client API.
+  * :github_nvflare_link:`sag_model_learner <examples/hello-world/step-by-step/cifar10/sag_model_learner/sag_model_learner.ipynb>` - scatter and gather workflow illustrating how to write client code using the ModelLearner.
+  * :github_nvflare_link:`sag_executor <examples/hello-world/step-by-step/cifar10/sag_executor/sag_executor.ipynb>` - scatter and gather workflow demonstrating show to write client-side executors.
+  * :github_nvflare_link:`sag_mlflow <examples/hello-world/step-by-step/cifar10/sag_mlflow/sag_mlflow.ipynb>` - MLflow experiment tracking logs with the Client API in scatter & gather workflows.
+  * :github_nvflare_link:`sag_he <examples/hello-world/step-by-step/cifar10/sag_he/sag_he.ipynb>` - homomorphic encyption using Client API and POC -he mode.
+  * :github_nvflare_link:`cse <examples/hello-world/step-by-step/cifar10/cse/cse.ipynb>` - cross-site evaluation using the Client API.
+  * :github_nvflare_link:`cyclic <examples/hello-world/step-by-step/cifar10/cyclic/cyclic.ipynb>` - cyclic weight transfer workflow with server-side controller.
+  * :github_nvflare_link:`cyclic_ccwf <examples/hello-world/step-by-step/cifar10/cyclic_ccwf/cyclic_ccwf.ipynb>` - client-controlled cyclic weight transfer workflow with client-side controller.
+  * :github_nvflare_link:`swarm <examples/hello-world/step-by-step/cifar10/swarm/swarm.ipynb>` - swarm learning and client-side cross-site evaluation with Client API.
+
+2.2 HIGGS Tabular Data Examples
+-------------------------------
+
+  * :github_nvflare_link:`tabular_stats <examples/hello-world/step-by-step/higgs/stats/tabular_stats.ipynb>`- federated stats tabular histogram calculation.
+  * :github_nvflare_link:`sklearn_linear <examples/hello-world/step-by-step/higgs/sklearn-linear/sklearn_linear.ipynb>`- federated linear model (logistic regression on binary classification) learning on tabular data.
+  * :github_nvflare_link:`sklearn_svm <examples/hello-world/step-by-step/higgs/sklearn-svm/sklearn_svm.ipynb>`- federated SVM model learning on tabular data.
+  * :github_nvflare_link:`sklearn_kmeans <examples/hello-world/step-by-step/higgs/sklearn-kmeans/sklearn_kmeans.ipynb>`- federated k-Means clustering on tabular data.
+  * :github_nvflare_link:`xgboost <examples/hello-world/step-by-step/higgs/xgboost/xgboost_horizontal.ipynb>`- federated horizontal xgboost learning on tabular data with bagging collaboration.
+
+
 3. Tutorial Notebooks
 =====================
 

diff --git a/docs/examples/fl_experiment_tracking_mlflow.rst b/docs/examples/fl_experiment_tracking_mlflow.rst
@@ -53,10 +53,10 @@ Adding MLflow Logging to Configurations
 
 Inside the config folder there are two files, ``config_fed_client.json`` and ``config_fed_server.json``.
 
-.. literalinclude:: ../../examples/advanced/experiment-tracking/mlflow/jobs/hello-pt-mlflow/app/config/config_fed_client.json
-   :language: json
+.. literalinclude:: ../../examples/advanced/experiment-tracking/mlflow/jobs/hello-pt-mlflow/app/config/config_fed_client.conf
+   :language:
    :linenos:
-   :caption: config_fed_client.json
+   :caption: config_fed_client.conf
 
 Take a look at the components section of the client config at line 24.
 The first component is the ``pt_learner`` which contains the initialization, training, and validation logic.
@@ -69,10 +69,10 @@ within NVFlare with the information to track.
 Finally, :class:`ConvertToFedEvent<nvflare.app_common.widgets.convert_to_fed_event.ConvertToFedEvent>` converts local events to federated events.
 This changes the event ``analytix_log_stats`` into a fed event ``fed.analytix_log_stats``, which will then be streamed from the clients to the server.
 
-.. literalinclude:: ../../examples/advanced/experiment-tracking/mlflow/jobs/hello-pt-mlflow/app/config/config_fed_server.json
-   :language: json
+.. literalinclude:: ../../examples/advanced/experiment-tracking/mlflow/jobs/hello-pt-mlflow/app/config/config_fed_server.conf
+   :language:
    :linenos:
-   :caption: config_fed_server.json
+   :caption: config_fed_server.conf
 
 Under the component section in the server config, we have the
 :class:`MLflowReceiver<nvflare.app_opt.tracking.mlflow.mlflow_receiver.MLflowReceiver>`. This component receives

diff --git a/docs/fl_introduction.rst b/docs/fl_introduction.rst
@@ -0,0 +1,64 @@
+.. _fl_introduction:
+
+###########################
+What is Federated Learning?
+###########################
+
+Federated Learning is a distributed learning paradigm where training occurs across multiple clients, each with their own local datasets.
+This enables the creation of common robust models without sharing sensitive local data, helping solve issues of data privacy and security.
+
+How does Federated Learning Work?
+=================================
+The federated learning (FL) server orchestrates the collaboration of multiple clients by first sending an initial model to the FL clients.
+The clients perform training on their local datasets, then send the model updates back to the FL server for aggregation to form a global model.
+This process forms a single round of federated learning and after a number of rounds, a robust global model can be developed.
+
+.. image:: resources/fl_diagram.png
+    :height: 500px
+    :align: center
+
+FL Terms and Definitions
+========================
+
+- FL server: manages job lifecycle, orchestrates workflow, assigns tasks to clients, performs aggregation
+- FL client: executes tasks, performs local computation/learning with local dataset, submits result back to FL server
+- FL algorithms: FedAvg, FedOpt, FedProx etc. implemented as workflows
+
+.. note::
+
+    Here we describe the centralized version of FL, where the FL server has the role of the aggregrator node. However in a decentralized version such as 
+    swarm learning, FL clients can serve as the aggregator node instead.
+
+- Types of FL
+
+  - horizontal FL: clients hold different data samples over the same features
+  - vertical FL: clients hold different features over an overlapping set of data samples
+  - swarm learning: a decentralized subset of FL where orchestration and aggregation is performed by the clients
+
+Main Benefits
+=============
+
+Enhanced Data Privacy and Security
+----------------------------------
+Federated learning facilitates data privacy and data locality by ensuring that the data remains at each site.
+Additionally, privacy preserving techniques such as homomorphic encryption and differential privacy filters can also be leveraged to further protect the transferred data.
+
+Improved Accuracy and Diversity
+-------------------------------
+By training with a variety of data sources across different clients, a robust and generalizable global model can be developed to better represent heterogeneous datasets.
+
+Scalability and Network Efficiency
+----------------------------------
+With the ability to perform training at the edge, federated learning can be highly scalable across the globe.
+Additionally only needing to transfer the model weights rather than entire datasets enables efficient use of network resources.
+
+Applications
+============
+An important application of federated learning is in the healthcare sector, where data privacy regulations and patient record confidentiality make training models challenging.
+Federated learning can help break down these healthcare data silos to allow hospitals and medical institutions to collaborate and pool their medical knowledge without the need to share their data.
+Some common use cases involve classification and detection tasks, drug discovery with federated protein LLMs, and federated analytics on medical devices.
+
+Furthermore there are many other areas and industries such as financial fraud detection, autonomous vehicles, HPC, mobile applications, etc. 
+where the ability to use distributed data silos while maintaining data privacy is essential for the development of better models.
+
+Read on to learn how FLARE is built as a flexible federated computing framework to enable federated learning from research to production.
diff --git a/docs/flare_overview.rst b/docs/flare_overview.rst
@@ -26,7 +26,7 @@ Built for productivity
 FLARE is designed for maximum productivity, providing a range of tools to enhance user experience and research efficiency at different stages of the development process:
 
 - **FLARE Client API:** Enables users to transition seamlessly from ML/DL to FL with just a few lines of code changes.
-- **Simulator CLI:** Allows users to simulate federated learning or computing jobs in multi-thread settings within a single computer, offering quick response and debugging. The same job can be deployed directly to production.
+- **Simulator CLI:** Allows users to simulate federated learning or computing jobs in multi-process settings within a single computer, offering quick response and debugging. The same job can be deployed directly to production.
 - **POC CLI:** Facilitates the simulation of federated learning or computing jobs in multi-process settings within one computer. Different processes represent server, clients, and an admin console, providing users with a realistic sense of the federated network. It also allows users to simulate project deployment on a single host.
 - **Job CLI:** Permits users to create and submit jobs directly in POC or production environments.
 - **FLARE API:** Enables users to run jobs directly from Python code or notebooks.

diff --git a/docs/getting_started.rst b/docs/getting_started.rst
@@ -22,14 +22,14 @@ Clone NVFLARE repo to get examples, switch main branch (latest stable branch)
 
   $ git clone https://github.com/NVIDIA/NVFlare.git
   $ cd NVFlare
-  $ git switch main
+  $ git switch 2.4
 
 
 Note on branches:
 
 * The `main <https://github.com/NVIDIA/NVFlare/tree/main>`_ branch is the default (unstable) development branch
 
-* The 2.0, 2.1, 2.2, and 2.3 etc. branches are the branches for each major release and minor patches
+* The 2.1, 2.2, 2.3, and 2.4 etc. branches are the branches for each major release and minor patches
 
 
 Quick Start with Simulator
@@ -63,6 +63,14 @@ establishing a secure, distributed FL workflow.
 Installation
 =============
 
+.. note::
+   The server and client versions of nvflare must match, we do not support cross-version compatibility.
+
+Supported Operating Systems
+---------------------------
+- Linux
+- OSX (Note: some optional dependencies are not compatible, such as tenseal and openmined.psi)
+
 Python Version
 --------------
 
@@ -117,7 +125,6 @@ You may find that the pip and setuptools versions in the venv need updating:
   (nvflare-env) $ python3 -m pip install -U pip
   (nvflare-env) $ python3 -m pip install -U setuptools
 
-
 Install Stable Release
 ----------------------
 
@@ -127,6 +134,11 @@ Stable releases are available on `NVIDIA FLARE PyPI <https://pypi.org/project/nv
 
   $ python3 -m pip install nvflare
 
+.. note::
+
+    In addition to the dependencies included when installing nvflare, many of our example applications have additional packages that must be installed.
+    Make sure to install from any requirement.txt files before running the examples.
+    See :github_nvflare_link:`nvflare/app_opt <nvflare/app_opt>` for modules and components with optional dependencies.
 
 .. _containerized_deployment:
 
@@ -210,7 +222,7 @@ Production mode is secure with TLS certificates - depending the choice the deplo
 
   - HA or non-HA
   - Local or remote
-  - On-premise or on cloud
+  - On-premise or on cloud (See :ref:`cloud_deployment`)
 
 Using non-HA, secure, local mode (all clients and server running on the same host), production mode is very similar to POC mode except it is secure.
 

diff --git a/docs/index.rst b/docs/index.rst
@@ -5,23 +5,37 @@ NVIDIA FLARE
 .. toctree::
    :maxdepth: -1
    :hidden:
+   :caption: Introduction
 
+   fl_introduction
    flare_overview
    whats_new
    getting_started
+
+.. toctree::
+   :maxdepth: -1
+   :hidden:
+   :caption: Guides
+
    example_applications_algorithms
    real_world_fl
    user_guide
    programming_guide
    best_practices
+
+.. toctree::
+   :maxdepth: -1
+   :hidden:
+   :caption: Miscellaneous
+
    faq
    publications_and_talks
    contributing
    API <apidocs/modules>
    glossary
 
 NVIDIA FLARE (NVIDIA Federated Learning Application Runtime Environment) is a domain-agnostic, open-source, extensible SDK that allows
-researchers and data scientists to adaptexisting ML/DL workflows (PyTorch, RAPIDS, Nemo, TensorFlow) to a federated paradigm; and enables
+researchers and data scientists to adapt existing ML/DL workflows (PyTorch, RAPIDS, Nemo, TensorFlow) to a federated paradigm; and enables
 platform developers to build a secure, privacy preserving offering for a distributed multi-party collaboration.
 
 NVIDIA FLARE is built on a componentized architecture that gives you the flexibility to take federated learning workloads from research
@@ -34,18 +48,21 @@ and simulation to real-world production deployment.  Some of the key components
  - **Management tools** for secure provisioning and deployment, orchestration, and management
  - **Specification-based API** for extensibility
 
-Learn more in the :ref:`FLARE Overview <flare_overview>`, :ref:`Key Features <key_features>`, :ref:`What's New <whats_new>`, and the
-:ref:`User Guide <user_guide>` and :ref:`Programming Guide <programming_guide>`.
+Learn more about FLARE features in the :ref:`FLARE Overview <flare_overview>` and :ref:`What's New <whats_new>`.
 
 Getting Started
 ===============
-For first-time users and FL researchers, FLARE provides the :ref:`fl_simulator` that allows you to build, test, and deploy applications locally.
-The :ref:`Getting Started guide <getting_started>` covers installation and walks through an example application using the FL Simulator.
+For first-time users and FL researchers, FLARE provides the :ref:`FL Simulator <fl_simulator>` that allows you to build, test, and deploy applications locally.
+The :ref:`Getting Started <getting_started>` guide covers installation and walks through an example application using the FL Simulator.
+Additional examples can be found at the :ref:`Examples Applications <example_applications_algorithms>`, which showcase different federated learning workflows and algorithms on various machine learning and deep learning tasks.
 
+FLARE for Users
+===============
+If you want to learn how to interact with the FLARE system, please refer to the :ref:`User Guide <user_guide>`.
 When you are ready to for a secure, distributed deployment, the :ref:`Real World Federated Learning <real_world_fl>` section covers the tools and process
 required to deploy and operate a secure, real-world FLARE project.
 
 FLARE for Developers
 ====================
-When you're ready to build your own application, the :ref:`Programming Best Practices <best_practices>`, :ref:`FAQ<faq>`, and
-:ref:`Programming Guide <programming_guide>` give an in depth look at the FLARE platform and APIs.
+When you're ready to build your own application, the :ref:`Programming Guide <programming_guide>`, :ref:`Programming Best Practices <best_practices>`, :ref:`FAQ<faq>`, and :ref:`API Reference <apidocs/modules>`
+give an in depth look at the FLARE platform and APIs.