Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nested Workspaces template #1322

Merged
merged 21 commits into from
Feb 2, 2025
Merged
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/straggler-handling.yml
Original file line number Diff line number Diff line change
@@ -35,4 +35,4 @@ jobs:
pip install .
- name: Test Straggler Handling Interface
run: |
python -m tests.github.test_hello_federation --template torch_cnn_mnist_straggler_check --fed_workspace aggregator --col1 col1 --col2 col2 --rounds-to-train 3
python -m tests.github.test_hello_federation --template torch/mnist_straggler_check --fed_workspace aggregator --col1 col1 --col2 col2 --rounds-to-train 3
22 changes: 11 additions & 11 deletions .github/workflows/task_runner_basic_e2e.yml
Original file line number Diff line number Diff line change
@@ -25,8 +25,8 @@ on:
type: choice
options:
- all
- torch_cnn_mnist
- keras_cnn_mnist
- torch/mnist
- keras/mnist
python_version:
description: "Python version"
required: false
@@ -85,21 +85,21 @@ jobs:
id: input_selection
run: |
# ---------------------------------------------------------------
# Models like XGBoost (xgb_higgs) and torch_cnn_histology require runners with higher memory and CPU to run.
# Models like XGBoost (xgb_higgs) and torch/histology require runners with higher memory and CPU to run.
# Thus these models are excluded from the matrix for now.
# Default combination if no input is provided (i.e. 'all' is selected).
# * TLS - models [torch_cnn_mnist, keras_cnn_mnist] and python versions [3.10, 3.11, 3.12]
# * Non-TLS - models [torch_cnn_mnist] and python version [3.10]
# * No client auth - models [keras_cnn_mnist] and python version [3.10]
# * Memory logs - models [torch_cnn_mnist] and python version [3.10]
# * TLS - models [torch/mnist, keras/mnist] and python versions [3.10, 3.11, 3.12]
# * Non-TLS - models [torch/mnist] and python version [3.10]
# * No client auth - models [keras/mnist] and python version [3.10]
# * Memory logs - models [torch/mnist] and python version [3.10]
# ---------------------------------------------------------------
echo "jobs_to_run=${{ env.JOBS_TO_RUN }}" >> "$GITHUB_OUTPUT"

if [ "${{ env.MODEL_NAME }}" == "all" ]; then
echo "models_for_tls=[\"torch_cnn_mnist\", \"keras_cnn_mnist\"]" >> "$GITHUB_OUTPUT"
echo "models_for_non_tls=[\"torch_cnn_mnist\"]" >> "$GITHUB_OUTPUT"
echo "models_for_no_client_auth=[\"keras_cnn_mnist\"]" >> "$GITHUB_OUTPUT"
echo "models_for_memory_logs=[\"torch_cnn_mnist\"]" >> "$GITHUB_OUTPUT"
echo "models_for_tls=[\"torch/mnist\", \"keras/mnist\"]" >> "$GITHUB_OUTPUT"
echo "models_for_non_tls=[\"torch/mnist\"]" >> "$GITHUB_OUTPUT"
echo "models_for_no_client_auth=[\"keras/mnist\"]" >> "$GITHUB_OUTPUT"
echo "models_for_memory_logs=[\"torch/mnist\"]" >> "$GITHUB_OUTPUT"
else
echo "models_for_tls=[\"${{env.MODEL_NAME}}\"]" >> "$GITHUB_OUTPUT"
echo "models_for_non_tls=[\"${{env.MODEL_NAME}}\"]" >> "$GITHUB_OUTPUT"
8 changes: 4 additions & 4 deletions .github/workflows/task_runner_dockerized_ws_e2e.yml
Original file line number Diff line number Diff line change
@@ -32,7 +32,7 @@ jobs:
timeout-minutes: 15
strategy:
matrix:
model_name: ["keras_cnn_mnist"]
model_name: ["keras/mnist"]
python_version: ["3.10", "3.11", "3.12"]
fail-fast: false # do not immediately fail if one of the combinations fail

@@ -73,7 +73,7 @@ jobs:
timeout-minutes: 15
strategy:
matrix:
model_name: ["keras_cnn_mnist"]
model_name: ["keras/mnist"]
python_version: ["3.10"]
fail-fast: false # do not immediately fail if one of the combinations fail

@@ -114,7 +114,7 @@ jobs:
timeout-minutes: 15
strategy:
matrix:
model_name: ["keras_cnn_mnist"]
model_name: ["keras/mnist"]
python_version: ["3.10"]
fail-fast: false # do not immediately fail if one of the combinations fail

@@ -155,7 +155,7 @@ jobs:
timeout-minutes: 15
strategy:
matrix:
model_name: ["keras_cnn_mnist"]
model_name: ["keras/mnist"]
python_version: ["3.10"]
fail-fast: false # do not immediately fail if one of the combinations fail

6 changes: 3 additions & 3 deletions .github/workflows/task_runner_fedeval_dws_e2e.yml
Original file line number Diff line number Diff line change
@@ -59,7 +59,7 @@ jobs:
if: needs.input_selection.outputs.selected_jobs == 'tls' || needs.input_selection.outputs.selected_jobs == 'all'
strategy:
matrix:
model_name: ["keras_cnn_mnist"]
model_name: ["keras/mnist"]
python_version: ["3.10"]
fail-fast: false # do not immediately fail if one of the combinations fail

@@ -102,7 +102,7 @@ jobs:
if: needs.input_selection.outputs.selected_jobs == 'non_tls' || needs.input_selection.outputs.selected_jobs == 'all'
strategy:
matrix:
model_name: ["keras_cnn_mnist"]
model_name: ["keras/mnist"]
python_version: ["3.10"]
fail-fast: false # do not immediately fail if one of the combinations fail

@@ -145,7 +145,7 @@ jobs:
if: needs.input_selection.outputs.selected_jobs == 'no_client_auth' || needs.input_selection.outputs.selected_jobs == 'all'
strategy:
matrix:
model_name: ["keras_cnn_mnist"]
model_name: ["keras/mnist"]
python_version: ["3.10"]
fail-fast: false # do not immediately fail if one of the combinations fail

12 changes: 6 additions & 6 deletions .github/workflows/task_runner_fedeval_e2e.yml
Original file line number Diff line number Diff line change
@@ -34,9 +34,9 @@ jobs:
timeout-minutes: 30
strategy:
matrix:
# Models like XGBoost (xgb_higgs) and torch_cnn_histology require runners with higher memory and CPU to run.
# Models like XGBoost (xgb_higgs) and torch/histology require runners with higher memory and CPU to run.
# Thus these models are excluded from the matrix for now.
model_name: ["torch_cnn_mnist", "keras_cnn_mnist"]
model_name: ["torch.mnist", "keras/mnist"]
python_version: ["3.10"]
fail-fast: false # do not immediately fail if one of the combinations fail

@@ -77,9 +77,9 @@ jobs:
timeout-minutes: 30
strategy:
matrix:
# Testing this scenario only for torch_cnn_mnist model and python 3.10
# Testing this scenario only for torch/mnist model and python 3.10
# If required, this can be extended to other models and python versions
model_name: ["torch_cnn_mnist"]
model_name: ["torch/mnist"]
python_version: ["3.10"]
fail-fast: false # do not immediately fail if one of the combinations fail

@@ -120,9 +120,9 @@ jobs:
timeout-minutes: 30
strategy:
matrix:
# Testing this scenario for keras_cnn_mnist model and python 3.10
# Testing this scenario for keras/mnist model and python 3.10
# If required, this can be extended to other models and python versions
model_name: ["keras_cnn_mnist"]
model_name: ["keras/mnist"]
python_version: ["3.10"]
fail-fast: false # do not immediately fail if one of the combinations fail

2 changes: 1 addition & 1 deletion .github/workflows/taskrunner.yml
Original file line number Diff line number Diff line change
@@ -32,4 +32,4 @@ jobs:
pip install .
- name: Task Runner API
run: |
python -m tests.github.test_hello_federation --template torch_cnn_mnist --fed_workspace aggregator --col1 collaborator1 --col2 collaborator2 --rounds-to-train 3 --save-model output_model
python -m tests.github.test_hello_federation --template torch/mnist --fed_workspace aggregator --col1 collaborator1 --col2 collaborator2 --rounds-to-train 3 --save-model output_model
2 changes: 1 addition & 1 deletion .github/workflows/taskrunner_eden_pipeline.yml
Original file line number Diff line number Diff line change
@@ -31,4 +31,4 @@ jobs:
pip install .
- name: Test TaskRunner API with Eden Compression
run: |
python -m tests.github.test_hello_federation --template torch_cnn_mnist_eden_compression --fed_workspace aggregator --col1 col1 --col2 col2 --rounds-to-train 3
python -m tests.github.test_hello_federation --template torch/mnist_eden_compression --fed_workspace aggregator --col1 col1 --col2 col2 --rounds-to-train 3
2 changes: 1 addition & 1 deletion .github/workflows/tr_docker_gramine_direct.yml
Original file line number Diff line number Diff line change
@@ -27,7 +27,7 @@ jobs:

- name: Create workspace image
run: |
fx workspace create --prefix example_workspace --template keras_cnn_mnist
fx workspace create --prefix example_workspace --template keras/mnist
cd example_workspace
fx plan initialize -a localhost

2 changes: 1 addition & 1 deletion .github/workflows/tr_docker_native.yml
Original file line number Diff line number Diff line change
@@ -27,7 +27,7 @@ jobs:

- name: Create workspace image
run: |
fx workspace create --prefix example_workspace --template keras_cnn_mnist
fx workspace create --prefix example_workspace --template keras/mnist
cd example_workspace
fx plan initialize -a localhost
fx workspace dockerize --save --revision https://github.com/${GITHUB_REPOSITORY}.git@${{ github.event.pull_request.head.sha }}
2 changes: 1 addition & 1 deletion .github/workflows/ubuntu.yml
Original file line number Diff line number Diff line change
@@ -53,4 +53,4 @@ jobs:
pip install .
- name: Test TaskRunner API
run: |
python -m tests.github.test_hello_federation --template keras_cnn_mnist --fed_workspace aggregator --col1 col1 --col2 col2 --rounds-to-train 3 --save-model output_model
python -m tests.github.test_hello_federation --template keras/mnist --fed_workspace aggregator --col1 col1 --col2 col2 --rounds-to-train 3 --save-model output_model
2 changes: 1 addition & 1 deletion .github/workflows/windows.yml
Original file line number Diff line number Diff line change
@@ -52,4 +52,4 @@ jobs:
pip install .
- name: Test TaskRunner API
run: |
python -m tests.github.test_hello_federation --template keras_cnn_mnist --fed_workspace aggregator --col1 col1 --col2 col2 --rounds-to-train 3 --save-model output_model
python -m tests.github.test_hello_federation --template keras/mnist --fed_workspace aggregator --col1 col1 --col2 col2 --rounds-to-train 3 --save-model output_model
19 changes: 8 additions & 11 deletions Jenkinsfile
Original file line number Diff line number Diff line change
@@ -1,18 +1,15 @@
def snykData = [
'openfl-docker': 'openfl-docker/Dockerfile.base',
'openfl': 'setup.py',
'openfl-workspace_tf_2dunet': 'openfl-workspace/tf_2dunet/requirements.txt',
'openfl-workspace_torch_cnn_mnist_straggler_check': 'openfl-workspace/torch_cnn_mnist_straggler_check/requirements.txt',
'openfl-workspace_keras_2dunet': 'openfl-workspace/keras/2dunet/requirements.txt',
'openfl-workspace_torch_cnn_mnist_straggler_check': 'openfl-workspace/torch/mnist_straggler_check/requirements.txt',
// CN-14619 snyk test CLI does not support -f in requirements.txt file
// 'openfl-workspace_torch_cnn_histology': 'openfl-workspace/torch_cnn_histology/requirements.txt',
'openfl-workspace_torch_cnn_histology_src': 'openfl-workspace/torch_cnn_histology/src/requirements.txt',
'openfl-workspace_keras_nlp': 'openfl-workspace/keras_nlp/requirements.txt',
'openfl-workspace_torch_cnn_mnist': 'openfl-workspace/torch_cnn_mnist/requirements.txt',
'openfl-workspace_torch_unet_kvasir': 'openfl-workspace/torch_unet_kvasir/requirements.txt',
'openfl-workspace_tf_cnn_histology': 'openfl-workspace/tf_cnn_histology/requirements.txt',
'openfl-workspace_tf_3dunet_brats': 'openfl-workspace/tf_3dunet_brats/requirements.txt',
'openfl-workspace_keras_cnn_with_compression': 'openfl-workspace/keras_cnn_with_compression/requirements.txt',
'openfl-workspace_keras_cnn_mnist': 'openfl-workspace/keras_cnn_mnist/requirements.txt',
// 'openfl-workspace_keras/histology': 'openfl-workspace/torch/histology/requirements.txt',
'openfl-workspace_keras/histology_src': 'openfl-workspace/torch/histology/src/requirements.txt',
'openfl-workspace_keras/nlp': 'openfl-workspace/keras/nlp/requirements.txt',
'openfl-workspace_torch_cnn_mnist': 'openfl-workspace/torch/mnist/requirements.txt',
'openfl-workspace_torch_unet_kvasir': 'openfl-workspace/torch/unet_kvasir/requirements.txt',
'openfl-workspace_keras_cnn_mnist': 'openfl-workspace/keras/mnist/requirements.txt',
'openfl-tutorials_interactive_api_pytorch_medmnist_2d_envoy': 'openfl-tutorials/interactive_api/PyTorch_MedMNIST_2D/envoy/requirements.txt',
'openfl-tutorials_interactive_api_pytorch_dogscats_vit_workspace': 'openfl-tutorials/interactive_api/PyTorch_DogsCats_ViT/workspace/requirements.txt',
'openfl-tutorials_interactive_api_pytorch_histology_envoy': 'openfl-tutorials/interactive_api/PyTorch_Histology/envoy/requirements.txt',
8 changes: 4 additions & 4 deletions docs/about/features_index/fed_eval.rst
Original file line number Diff line number Diff line change
@@ -26,7 +26,7 @@ Example Using the Task Runner API (Aggregator-based Workflow)

The following steps can be leveraged to achieve practical e2e usage of FedEval

*N.B*: We will be using torch_cnn_mnist plan itself for both training and with some minor changes for evaluation as well
*N.B*: We will be using torch/mnist plan itself for both training and with some minor changes for evaluation as well

*Prerequisites*: Please ensure that OpenFL version==1.7 is installed or you can also choose to install latest from source.

@@ -48,13 +48,13 @@ With OpenFL version==1.7 aggregator start command is enhanced to have an optiona
--help Show this message and exit.

1. **Setup**
We will use the `torch_cnn_mnist` workspace for training
We will use the `torch/mnist` workspace for training

Let's first configure a workspace with all necesary certificates

.. code-block:: shell

fx workspace create --prefix ./cnn_train_eval --template torch_cnn_mnist
fx workspace create --prefix ./cnn_train_eval --template torch/mnist
cd cnn_train_eval
fx workspace certify
fx aggregator generate-cert-request
@@ -416,7 +416,7 @@ The updated plan post initialization with edits to make it ready for evaluation
metrics:
- loss

We have done following changes to the initialized torch_cnn_mnist plan in the new workspace:
We have done following changes to the initialized torch/mnist plan in the new workspace:
- Set the rounds_to_train to 1 as evaluation needs just one round of federation run across the collaborators
- Removed all other training related tasks from assigner settings except "aggregated_model_validation"
Now let's replace the ``init.pbuf`` with the previously saved ``trained_model.pbuf``
14 changes: 7 additions & 7 deletions docs/about/features_index/taskrunner.rst
Original file line number Diff line number Diff line change
@@ -88,7 +88,7 @@ Each YAML top-level section contains the following subsections:

The following is an example of a **plan.yaml**:

.. literalinclude:: ../../../openfl-workspace/torch_cnn_mnist/plan/plan.yaml
.. literalinclude:: ../../../openfl-workspace/torch/mnist/plan/plan.yaml
:language: yaml


@@ -150,22 +150,22 @@ STEP 1: Create a Workspace
$ fx


2. This example uses the :code:`keras_cnn_mnist` template.
2. This example uses the :code:`keras/mnist` template.

Set the environment variables to use the :code:`keras_cnn_mnist` as the template and :code:`${HOME}/my_federation` as the path to the workspace directory.
Set the environment variables to use the :code:`keras/mnist` as the template and :code:`${HOME}/my_federation` as the path to the workspace directory.

.. code-block:: shell

$ export WORKSPACE_TEMPLATE=keras_cnn_mnist
$ export WORKSPACE_TEMPLATE=keras/mnist
$ export WORKSPACE_PATH=${HOME}/my_federation

3. Decide a workspace template, which are end-to-end federated learning training demonstrations. The following is a sample of available templates:

- :code:`keras_cnn_mnist`: a workspace with a simple `Keras <http://keras.io/>`__ CNN model that will download the `MNIST <http://yann.lecun.com/exdb/mnist/>`_ dataset and train in a federation.
- :code:`keras/mnist`: a workspace with a simple `Keras <http://keras.io/>`__ CNN model that will download the `MNIST <http://yann.lecun.com/exdb/mnist/>`_ dataset and train in a federation.
- :code:`tf_2dunet`: a workspace with a simple `TensorFlow <http://tensorflow.org>`__ CNN model that will use the `BraTS <https://www.med.upenn.edu/sbia/brats2017/data.html>`_ dataset and train in a federation.
- :code:`tf_cnn_histology`: a workspace with a simple `TensorFlow <http://tensorflow.org>`__ CNN model that will download the `Colorectal Histology <https://zenodo.org/record/53169#.XGZemKwzbmG>`_ dataset and train in a federation.
- :code:`torch_cnn_histology`: a workspace with a simple `PyTorch <http://pytorch.org/>`__ CNN model that will download the `Colorectal Histology <https://zenodo.org/record/53169#.XGZemKwzbmG>`_ dataset and train in a federation.
- :code:`torch_cnn_mnist`: a workspace with a simple `PyTorch <http://pytorch.org>`__ CNN model that will download the `MNIST <http://yann.lecun.com/exdb/mnist/>`_ dataset and train in a federation.
- :code:`keras/histology`: a workspace with a simple `PyTorch <http://pytorch.org/>`__ CNN model that will download the `Colorectal Histology <https://zenodo.org/record/53169#.XGZemKwzbmG>`_ dataset and train in a federation.
- :code:`torch/mnist`: a workspace with a simple `PyTorch <http://pytorch.org>`__ CNN model that will download the `MNIST <http://yann.lecun.com/exdb/mnist/>`_ dataset and train in a federation.

See the complete list of available templates.

Original file line number Diff line number Diff line change
@@ -83,7 +83,7 @@ For logging through Tensorboard, enable the parameter :code:`write_logs : true`
settings :
write_logs : true

Follow the steps below to write your custom callback function instead. As an example, a full implementation can be found at `Federated_Pytorch_MNIST_Tutorial.ipynb <https://github.com/intel/openfl/blob/develop/openfl-tutorials/Federated_Pytorch_MNIST_Tutorial.ipynb>`_ and in the **torch_cnn_mnist** workspace.
Follow the steps below to write your custom callback function instead. As an example, a full implementation can be found at `Federated_Pytorch_MNIST_Tutorial.ipynb <https://github.com/intel/openfl/blob/develop/openfl-tutorials/Federated_Pytorch_MNIST_Tutorial.ipynb>`_ and in the **torch/mnist** workspace.

1. Define the callback function, like how you defined in Python API, in the **src** directory in your workspace.

@@ -95,9 +95,9 @@ Follow the steps below to write your custom callback function instead. As an exa
defaults : plan/defaults/aggregator.yaml
template : openfl.component.Aggregator
settings :
init_state_path : save/torch_cnn_mnist_init.pbuf
best_state_path : save/torch_cnn_mnist_best.pbuf
last_state_path : save/torch_cnn_mnist_last.pbuf
init_state_path : save/torch/mnist_init.pbuf
best_state_path : save/torch/mnist_best.pbuf
last_state_path : save/torch/mnist_last.pbuf
rounds_to_train : 10
write_logs : true
log_metric_callback :
Original file line number Diff line number Diff line change
@@ -29,7 +29,7 @@ The following are the straggler handling algorithms supported in OpenFL:
Demonstration of adding the straggler handling interface
=========================================================

The example template, **torch_cnn_mnist_straggler_check**, uses the ``PercentageBasedStragglerHandling``. To gain a better understanding of how experiments perform, you can modify the **percent_collaborators_needed** or **minimum_reporting** parameter in the template **plan.yaml** or even choose **CutoffTimeBasedStragglerHandling** function instead:
The example template, **torch/mnist_straggler_check**, uses the ``PercentageBasedStragglerHandling``. To gain a better understanding of how experiments perform, you can modify the **percent_collaborators_needed** or **minimum_reporting** parameter in the template **plan.yaml** or even choose **CutoffTimeBasedStragglerHandling** function instead:

.. code-block:: yaml

Loading