forked from NVIDIA/NVFlare
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'main' into support_client_hierarchy
- Loading branch information
Showing
14 changed files
with
1,825 additions
and
584 deletions.
There are no files selected for viewing
587 changes: 10 additions & 577 deletions
587
examples/advanced/distributed_optimization/1-consensus/tutorial.ipynb
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,3 +3,4 @@ torch | |
torchvision | ||
jsonargparse[signatures]>=4.17.0 | ||
pytorch_lightning | ||
tensorboard |
Empty file.
41 changes: 41 additions & 0 deletions
41
...ed_federated_learning/chapter-9_flare_low_level_apis/09.0_introduction/introduction.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# NVFlare low-level APIs\n", | ||
"\n", | ||
"In this chapter, we will explore some of the low-level APIs available in NVFlare to build fully custom components and algorithms.\n", | ||
"\n", | ||
"We'll explore the following topics:\n", | ||
"\n", | ||
"- [Building custom components](../09.1_building_custom_components/building_custom_components.ipynb) - we'll discuss how to implement custom `Controller`s and `Executor`s and instantiate them\n", | ||
"- [Sending and receiving tasks and data](../09.2_tasks_and_data_share/tasks_and_data_share.ipynb) - we'll see how to send and receive tasks and data between the `Controller` and `Executor`s\n", | ||
"- [Peer-to-peer communincation](../09.3_p2p_communication/p2p_communication.ipynb) - we'll see how to implement peer-to-peer communication between `Executor`s\n", | ||
"- [Implementing custom P2P algorithms](../09.4_implementing_p2p_algorithms/) - we'll put together what we learned in the previous sections to implement custom peer-to-peer algorithms. As a guiding example, we will walk through how to implement from scratch the `nvflare.app_opt.p2p` module, which implements peer-to-peer (P2P) distributed optimization algorithms.\n" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3 (ipykernel)", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.12.7" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 4 | ||
} |
241 changes: 241 additions & 0 deletions
241
...r-9_flare_low_level_apis/09.1_building_custom_components/building_custom_components.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,241 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Building custom components\n", | ||
"\n", | ||
"In NVFlare, the [`FLComponent`](https://nvflare.readthedocs.io/en/main/apidocs/nvflare.apis.fl_component.html#nvflare.apis.fl_component.FLComponent) is the base class of all components, including controllers, executors, responders, filters, aggregators and many others (see the [here](https://nvflare.readthedocs.io/en/main/programming_guide/fl_component.html) for more details). `FLComponent`s have the capability to handle and fire events and contain various methods for logging.\n", | ||
"\n", | ||
"In this notebook, we'll explore how to implement a very basic custom `Controller` and `Executor` in NVFlare:\n", | ||
"- a [`Controller`](https://nvflare.readthedocs.io/en/main/apidocs/nvflare.apis.impl.controller.html) is a server-side component responsible for managing job execution and orchestrating tasks across clients\n", | ||
"- an [`Executor`](https://nvflare.readthedocs.io/en/main/apidocs/nvflare.apis.executor.html) is a client-side component that processes tasks received from the controller and executes them accordingly\n", | ||
"\n", | ||
"For this first tutorial we won't go into the details of tasks and execution. Instead, we'll start by creating placeholder versions of these components and then add some very basic functionality to them to better understand the flow of execution and the roles of the different components involved." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## The Controller\n", | ||
"We'll start by creating a `PlaceholderController` class that doesn't perform any specific tasks but serves as a template for understanding how controllers work in NVFlare.\n", | ||
"\n", | ||
"As a subclass of `Controller`, our `PlaceholderController` must implement three methods:\n", | ||
"\n", | ||
"- `control_flow`: defining the main control flow of the controller. It receives an `abort_signal: Signal` and an `fl_ctx: FLContext`. A [`Signal`](https://nvflare.readthedocs.io/en/main/apidocs/nvflare.apis.signal.html#nvflare.apis.signal.Signal) is an object that provides a mechanism to signal events like abortion. Controllers and executors can check this signal to determine if they should stop execution gracefully. More details on handling abort signals [here](https://nvflare.readthedocs.io/en/main/best_practices.html#respect-the-abort-signal). An [`FLContext`](https://nvflare.readthedocs.io/en/main/apidocs/nvflare.apis.fl_context.html#nvflare.apis.fl_context.FLContext) object carries all the execution context, which includes information about the current execution environment, such as run number, job ID, and other configurations. It's passed to many methods to provide context. More details can be found [here](https://nvflare.readthedocs.io/en/main/programming_guide/fl_context.html). We'll talk more about it in the next notebooks.\n", | ||
"- `start_controller`: called once before the `control_flow` method. It's used to perform any setup tasks or initialize resources.\n", | ||
"- `stop_controller`: called once after the `control_flow` method. It's used to clean up resources or perform any finalization tasks.\n", | ||
"\n", | ||
"For the moment, we'll leave these methods empty. We'll add some functionality to them later in the notebook.\n", | ||
"\n", | ||
"```python\n", | ||
"from nvflare.apis.fl_context import FLContext\n", | ||
"from nvflare.apis.impl.controller import Controller\n", | ||
"from nvflare.apis.signal import Signal\n", | ||
"\n", | ||
"class PlaceholderController(Controller):\n", | ||
"\n", | ||
" def control_flow(self, abort_signal: Signal, fl_ctx: FLContext):\n", | ||
" pass\n", | ||
"\n", | ||
" def start_controller(self, fl_ctx: FLContext):\n", | ||
" pass\n", | ||
"\n", | ||
" def stop_controller(self, fl_ctx: FLContext):\n", | ||
" pass\n", | ||
"```\n", | ||
"\n", | ||
"Let's try to instantiate the controller and run it via the NVFlare simulator. As we've seen in previous chapters, objects sent to server and clients need to be importable by the simulator. Since we are in a notebook, we can't import the controller directly, so let's put it in a separate file `modules.py` and import it from there - for convenience, we have already included all the custom components we'll use in this chapter in `modules.py`, but feel free to delete the file and recreate it from scratch while we move forward." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from nvflare.job_config.api import FedJob\n", | ||
"from modules import PlaceholderController\n", | ||
"\n", | ||
"# Create job\n", | ||
"job = FedJob(name=\"placeholder_job\")\n", | ||
"\n", | ||
"# send controller to server\n", | ||
"controller = PlaceholderController()\n", | ||
"job.to_server(controller)\n", | ||
"\n", | ||
"# Run job via the NVFlare simulator\n", | ||
"job.simulator_run(\"./tmp/\")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Now, as you can see, trying to run the previous job results in an error. This is because we have only created a server running our `PlaceholderController` but ut we haven't sent any executors to the clients. \n", | ||
"\n", | ||
"To fix that, let's move on to the next section to explore `Executor`s and create a custom one." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## The Executor\n", | ||
"\n", | ||
"When subclassing an `Executor` the main method that must be implmented is the `execute` method, which defines how the `Executor` processes a task received from the `Controller`. It receives the following arguments in addition to an `abort_signal` and `fl_ctx`:\n", | ||
"- `task_name`: the name of the task to be executed\n", | ||
"- `shareable`: a [`Shareable`](https://nvflare.readthedocs.io/en/main/apidocs/nvflare.apis.shareable.html#nvflare.apis.shareable.Shareable) object containing the task data. We'll talk more about it in the next notebook but for the moment just note that this is the data structure used for communication between the server and clients. It wraps the data exchanged and can include metadata or headers. More details on `Shareable`s [here](https://nvflare.readthedocs.io/en/main/programming_guide/shareable.html).\n", | ||
"\n", | ||
"Let's create a `PlaceholderExecutor` class that doesn't perform any specific tasks but serves as a template for understanding how executors work in NVFlare.\n", | ||
"\n", | ||
"```python\n", | ||
"class PlaceholderExecutor(Executor):\n", | ||
"\n", | ||
" def execute(\n", | ||
" self,\n", | ||
" task_name: str,\n", | ||
" shareable: Shareable,\n", | ||
" fl_ctx: FLContext,\n", | ||
" abort_signal: Signal,\n", | ||
" ):\n", | ||
" pass\n", | ||
"```\n", | ||
"\n", | ||
"Let's now add 3 clients, assign our `PlaceholderExecutor` to each of them and run our job." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from modules import PlaceholderExecutor\n", | ||
"\n", | ||
"# Create job\n", | ||
"job = FedJob(name=\"placeholder_job\")\n", | ||
"\n", | ||
"# send controller to server\n", | ||
"controller = PlaceholderController()\n", | ||
"job.to_server(controller)\n", | ||
"\n", | ||
"# send executor to clients\n", | ||
"num_clients = 3\n", | ||
"for i in range(num_clients):\n", | ||
" executor = PlaceholderExecutor()\n", | ||
" job.to(executor, f\"site-{i}\")\n", | ||
"\n", | ||
"# Run job via the NVFlare simulator\n", | ||
"job.simulator_run(\"./tmp/\")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"When you run the above code, you'll notice that the number of `Total clients` in the system increases from 1 to 3 as the clients start. However, since both the controller and executors are not performing any specific operations (their methods are empty), the job completes without any significant actions." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Logging info\n", | ||
"\n", | ||
"To explore the actual flow of execution, let's add some very basic logging to our `PlaceholderController` and `PlaceholderExecutor`. \n", | ||
"\n", | ||
"We want both our controller and executors to log when they start and stop but we'll do that in two different ways. \n", | ||
"\n", | ||
"For the controller, we'll do that in the `start_controller` and `stop_controller` methods.\n", | ||
"For the executors, we'll do that by overriding their `handle_event` method, which as the name suggests, is used to handle different events. Notice that NVFlare uses an event-driven architecture where components can fire and handle events to coordinate actions. By checking for `EventType.START_RUN` or `EventType.END_RUN` events we can log messages when the executor starts and stops (you can read more about event handling [here](https://nvflare.readthedocs.io/en/main/programming_guide/component_configuration.html#component-configuration-and-event-handling)).\n", | ||
"\n", | ||
"We can use the `log_info` method which is available to any subclass of `FLComponent` (which both `Controller` and `Executor` inherit from).\n", | ||
"\n", | ||
"```python\n", | ||
"class LoggingController(Controller):\n", | ||
" def control_flow(self, abort_signal: Signal, fl_ctx: FLContext):\n", | ||
" pass\n", | ||
"\n", | ||
" def start_controller(self, fl_ctx: FLContext):\n", | ||
" self.log_info(fl_ctx, \"Starting the controller...\")\n", | ||
"\n", | ||
" def stop_controller(self, fl_ctx: FLContext):\n", | ||
" self.log_info(fl_ctx, \"Stopping the controller...\")\n", | ||
"\n", | ||
"\n", | ||
"class LoggingExecutor(Executor):\n", | ||
" def execute(\n", | ||
" self,\n", | ||
" task_name: str,\n", | ||
" shareable: Shareable,\n", | ||
" fl_ctx: FLContext,\n", | ||
" abort_signal: Signal,\n", | ||
" ):\n", | ||
" pass\n", | ||
"\n", | ||
" def handle_event(self, event_type, fl_ctx):\n", | ||
" if event_type == EventType.START_RUN:\n", | ||
" self.log_info(fl_ctx, \"Starting the executor...\")\n", | ||
" elif event_type == EventType.END_RUN:\n", | ||
" self.log_info(fl_ctx, \"Stopping the executor...\")\n", | ||
"```" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from modules import LoggingController, LoggingExecutor\n", | ||
"\n", | ||
"# Create job\n", | ||
"job = FedJob(name=\"logging_job\")\n", | ||
"\n", | ||
"# send controller to server\n", | ||
"controller = LoggingController()\n", | ||
"job.to_server(controller)\n", | ||
"\n", | ||
"# send executor to clients\n", | ||
"num_clients = 3\n", | ||
"for i in range(num_clients):\n", | ||
" executor = LoggingExecutor()\n", | ||
" job.to(executor, f\"site-{i}\")\n", | ||
"\n", | ||
"# Run job via the NVFlare simulator\n", | ||
"job.simulator_run(\"./tmp/\")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Now, if you look at the output of the cell above and look for the logs of the `LoggingController` and `LoggingExecutor`s, you should see that they are logging the start and stop events. You should also see that the controller is logging the start and stop events before and after the executors are logging theirs. We'll see more about that in the next section." | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": ".venv", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.12.7" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 4 | ||
} |
69 changes: 69 additions & 0 deletions
69
...erated_learning/chapter-9_flare_low_level_apis/09.1_building_custom_components/modules.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
from nvflare.apis.event_type import EventType | ||
from nvflare.apis.executor import Executor | ||
from nvflare.apis.fl_context import FLContext | ||
from nvflare.apis.impl.controller import Controller | ||
from nvflare.apis.shareable import Shareable | ||
from nvflare.apis.signal import Signal | ||
|
||
|
||
class PlaceholderController(Controller): | ||
def control_flow(self, abort_signal: Signal, fl_ctx: FLContext): | ||
pass | ||
|
||
def start_controller(self, fl_ctx: FLContext): | ||
pass | ||
|
||
def stop_controller(self, fl_ctx: FLContext): | ||
pass | ||
|
||
|
||
class PlaceholderExecutor(Executor): | ||
def execute( | ||
self, | ||
task_name: str, | ||
shareable: Shareable, | ||
fl_ctx: FLContext, | ||
abort_signal: Signal, | ||
): | ||
pass | ||
|
||
|
||
class LoggingController(Controller): | ||
def control_flow(self, abort_signal: Signal, fl_ctx: FLContext): | ||
pass | ||
|
||
def start_controller(self, fl_ctx: FLContext): | ||
self.log_info(fl_ctx, "Starting the controller...") | ||
|
||
def stop_controller(self, fl_ctx: FLContext): | ||
self.log_info(fl_ctx, "Stopping the controller...") | ||
|
||
|
||
class LoggingExecutor(Executor): | ||
def execute( | ||
self, | ||
task_name: str, | ||
shareable: Shareable, | ||
fl_ctx: FLContext, | ||
abort_signal: Signal, | ||
): | ||
pass | ||
|
||
def handle_event(self, event_type, fl_ctx): | ||
if event_type == EventType.START_RUN: | ||
self.log_info(fl_ctx, "Starting the executor...") | ||
elif event_type == EventType.END_RUN: | ||
self.log_info(fl_ctx, "Stopping the executor...") |
Oops, something went wrong.