Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nested Workspaces template #1322

Merged
merged 21 commits into from
Feb 2, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/straggler-handling.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,4 +35,4 @@ jobs:
pip install .
- name: Test Straggler Handling Interface
run: |
python -m tests.github.test_hello_federation --template torch_cnn_mnist_straggler_check --fed_workspace aggregator --col1 col1 --col2 col2 --rounds-to-train 3
python -m tests.github.test_hello_federation --template torch/mnist_straggler_check --fed_workspace aggregator --col1 col1 --col2 col2 --rounds-to-train 3
22 changes: 11 additions & 11 deletions .github/workflows/task_runner_basic_e2e.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,8 @@ on:
type: choice
options:
- all
- torch_cnn_mnist
- keras_cnn_mnist
- torch/mnist
- keras/mnist
python_version:
description: "Python version"
required: false
Expand Down Expand Up @@ -85,21 +85,21 @@ jobs:
id: input_selection
run: |
# ---------------------------------------------------------------
# Models like XGBoost (xgb_higgs) and torch_cnn_histology require runners with higher memory and CPU to run.
# Models like XGBoost (xgb_higgs) and torch/histology require runners with higher memory and CPU to run.
# Thus these models are excluded from the matrix for now.
# Default combination if no input is provided (i.e. 'all' is selected).
# * TLS - models [torch_cnn_mnist, keras_cnn_mnist] and python versions [3.10, 3.11, 3.12]
# * Non-TLS - models [torch_cnn_mnist] and python version [3.10]
# * No client auth - models [keras_cnn_mnist] and python version [3.10]
# * Memory logs - models [torch_cnn_mnist] and python version [3.10]
# * TLS - models [torch/mnist, keras/mnist] and python versions [3.10, 3.11, 3.12]
# * Non-TLS - models [torch/mnist] and python version [3.10]
# * No client auth - models [keras/mnist] and python version [3.10]
# * Memory logs - models [torch/mnist] and python version [3.10]
# ---------------------------------------------------------------
echo "jobs_to_run=${{ env.JOBS_TO_RUN }}" >> "$GITHUB_OUTPUT"

if [ "${{ env.MODEL_NAME }}" == "all" ]; then
echo "models_for_tls=[\"torch_cnn_mnist\", \"keras_cnn_mnist\"]" >> "$GITHUB_OUTPUT"
echo "models_for_non_tls=[\"torch_cnn_mnist\"]" >> "$GITHUB_OUTPUT"
echo "models_for_no_client_auth=[\"keras_cnn_mnist\"]" >> "$GITHUB_OUTPUT"
echo "models_for_memory_logs=[\"torch_cnn_mnist\"]" >> "$GITHUB_OUTPUT"
echo "models_for_tls=[\"torch/mnist\", \"keras/mnist\"]" >> "$GITHUB_OUTPUT"
echo "models_for_non_tls=[\"torch/mnist\"]" >> "$GITHUB_OUTPUT"
echo "models_for_no_client_auth=[\"keras/mnist\"]" >> "$GITHUB_OUTPUT"
echo "models_for_memory_logs=[\"torch/mnist\"]" >> "$GITHUB_OUTPUT"
else
echo "models_for_tls=[\"${{env.MODEL_NAME}}\"]" >> "$GITHUB_OUTPUT"
echo "models_for_non_tls=[\"${{env.MODEL_NAME}}\"]" >> "$GITHUB_OUTPUT"
Expand Down
8 changes: 4 additions & 4 deletions .github/workflows/task_runner_dockerized_ws_e2e.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ jobs:
timeout-minutes: 15
strategy:
matrix:
model_name: ["keras_cnn_mnist"]
model_name: ["keras/mnist"]
python_version: ["3.10", "3.11", "3.12"]
fail-fast: false # do not immediately fail if one of the combinations fail

Expand Down Expand Up @@ -73,7 +73,7 @@ jobs:
timeout-minutes: 15
strategy:
matrix:
model_name: ["keras_cnn_mnist"]
model_name: ["keras/mnist"]
python_version: ["3.10"]
fail-fast: false # do not immediately fail if one of the combinations fail

Expand Down Expand Up @@ -114,7 +114,7 @@ jobs:
timeout-minutes: 15
strategy:
matrix:
model_name: ["keras_cnn_mnist"]
model_name: ["keras/mnist"]
python_version: ["3.10"]
fail-fast: false # do not immediately fail if one of the combinations fail

Expand Down Expand Up @@ -155,7 +155,7 @@ jobs:
timeout-minutes: 15
strategy:
matrix:
model_name: ["keras_cnn_mnist"]
model_name: ["keras/mnist"]
python_version: ["3.10"]
fail-fast: false # do not immediately fail if one of the combinations fail

Expand Down
6 changes: 3 additions & 3 deletions .github/workflows/task_runner_fedeval_dws_e2e.yml
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ jobs:
if: needs.input_selection.outputs.selected_jobs == 'tls' || needs.input_selection.outputs.selected_jobs == 'all'
strategy:
matrix:
model_name: ["keras_cnn_mnist"]
model_name: ["keras/mnist"]
python_version: ["3.10"]
fail-fast: false # do not immediately fail if one of the combinations fail

Expand Down Expand Up @@ -102,7 +102,7 @@ jobs:
if: needs.input_selection.outputs.selected_jobs == 'non_tls' || needs.input_selection.outputs.selected_jobs == 'all'
strategy:
matrix:
model_name: ["keras_cnn_mnist"]
model_name: ["keras/mnist"]
python_version: ["3.10"]
fail-fast: false # do not immediately fail if one of the combinations fail

Expand Down Expand Up @@ -145,7 +145,7 @@ jobs:
if: needs.input_selection.outputs.selected_jobs == 'no_client_auth' || needs.input_selection.outputs.selected_jobs == 'all'
strategy:
matrix:
model_name: ["keras_cnn_mnist"]
model_name: ["keras/mnist"]
python_version: ["3.10"]
fail-fast: false # do not immediately fail if one of the combinations fail

Expand Down
12 changes: 6 additions & 6 deletions .github/workflows/task_runner_fedeval_e2e.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,9 +34,9 @@ jobs:
timeout-minutes: 30
strategy:
matrix:
# Models like XGBoost (xgb_higgs) and torch_cnn_histology require runners with higher memory and CPU to run.
# Models like XGBoost (xgb_higgs) and torch/histology require runners with higher memory and CPU to run.
# Thus these models are excluded from the matrix for now.
model_name: ["torch_cnn_mnist", "keras_cnn_mnist"]
model_name: ["torch.mnist", "keras/mnist"]
python_version: ["3.10"]
fail-fast: false # do not immediately fail if one of the combinations fail

Expand Down Expand Up @@ -77,9 +77,9 @@ jobs:
timeout-minutes: 30
strategy:
matrix:
# Testing this scenario only for torch_cnn_mnist model and python 3.10
# Testing this scenario only for torch/mnist model and python 3.10
# If required, this can be extended to other models and python versions
model_name: ["torch_cnn_mnist"]
model_name: ["torch/mnist"]
python_version: ["3.10"]
fail-fast: false # do not immediately fail if one of the combinations fail

Expand Down Expand Up @@ -120,9 +120,9 @@ jobs:
timeout-minutes: 30
strategy:
matrix:
# Testing this scenario for keras_cnn_mnist model and python 3.10
# Testing this scenario for keras/mnist model and python 3.10
# If required, this can be extended to other models and python versions
model_name: ["keras_cnn_mnist"]
model_name: ["keras/mnist"]
python_version: ["3.10"]
fail-fast: false # do not immediately fail if one of the combinations fail

Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/taskrunner.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,4 +32,4 @@ jobs:
pip install .
- name: Task Runner API
run: |
python -m tests.github.test_hello_federation --template torch_cnn_mnist --fed_workspace aggregator --col1 collaborator1 --col2 collaborator2 --rounds-to-train 3 --save-model output_model
python -m tests.github.test_hello_federation --template torch/mnist --fed_workspace aggregator --col1 collaborator1 --col2 collaborator2 --rounds-to-train 3 --save-model output_model
2 changes: 1 addition & 1 deletion .github/workflows/taskrunner_eden_pipeline.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,4 +31,4 @@ jobs:
pip install .
- name: Test TaskRunner API with Eden Compression
run: |
python -m tests.github.test_hello_federation --template torch_cnn_mnist_eden_compression --fed_workspace aggregator --col1 col1 --col2 col2 --rounds-to-train 3
python -m tests.github.test_hello_federation --template torch/mnist_eden_compression --fed_workspace aggregator --col1 col1 --col2 col2 --rounds-to-train 3
2 changes: 1 addition & 1 deletion .github/workflows/tr_docker_gramine_direct.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ jobs:

- name: Create workspace image
run: |
fx workspace create --prefix example_workspace --template keras_cnn_mnist
fx workspace create --prefix example_workspace --template keras/mnist
cd example_workspace
fx plan initialize -a localhost

Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/tr_docker_native.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ jobs:

- name: Create workspace image
run: |
fx workspace create --prefix example_workspace --template keras_cnn_mnist
fx workspace create --prefix example_workspace --template keras/mnist
cd example_workspace
fx plan initialize -a localhost
fx workspace dockerize --save --revision https://github.com/${GITHUB_REPOSITORY}.git@${{ github.event.pull_request.head.sha }}
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/ubuntu.yml
Original file line number Diff line number Diff line change
Expand Up @@ -53,4 +53,4 @@ jobs:
pip install .
- name: Test TaskRunner API
run: |
python -m tests.github.test_hello_federation --template keras_cnn_mnist --fed_workspace aggregator --col1 col1 --col2 col2 --rounds-to-train 3 --save-model output_model
python -m tests.github.test_hello_federation --template keras/mnist --fed_workspace aggregator --col1 col1 --col2 col2 --rounds-to-train 3 --save-model output_model
2 changes: 1 addition & 1 deletion .github/workflows/windows.yml
Original file line number Diff line number Diff line change
Expand Up @@ -52,4 +52,4 @@ jobs:
pip install .
- name: Test TaskRunner API
run: |
python -m tests.github.test_hello_federation --template keras_cnn_mnist --fed_workspace aggregator --col1 col1 --col2 col2 --rounds-to-train 3 --save-model output_model
python -m tests.github.test_hello_federation --template keras/mnist --fed_workspace aggregator --col1 col1 --col2 col2 --rounds-to-train 3 --save-model output_model
19 changes: 8 additions & 11 deletions Jenkinsfile
Original file line number Diff line number Diff line change
@@ -1,18 +1,15 @@
def snykData = [
'openfl-docker': 'openfl-docker/Dockerfile.base',
'openfl': 'setup.py',
'openfl-workspace_tf_2dunet': 'openfl-workspace/tf_2dunet/requirements.txt',
'openfl-workspace_torch_cnn_mnist_straggler_check': 'openfl-workspace/torch_cnn_mnist_straggler_check/requirements.txt',
'openfl-workspace_keras_2dunet': 'openfl-workspace/keras/2dunet/requirements.txt',
'openfl-workspace_torch_cnn_mnist_straggler_check': 'openfl-workspace/torch/mnist_straggler_check/requirements.txt',
// CN-14619 snyk test CLI does not support -f in requirements.txt file
// 'openfl-workspace_torch_cnn_histology': 'openfl-workspace/torch_cnn_histology/requirements.txt',
'openfl-workspace_torch_cnn_histology_src': 'openfl-workspace/torch_cnn_histology/src/requirements.txt',
'openfl-workspace_keras_nlp': 'openfl-workspace/keras_nlp/requirements.txt',
'openfl-workspace_torch_cnn_mnist': 'openfl-workspace/torch_cnn_mnist/requirements.txt',
'openfl-workspace_torch_unet_kvasir': 'openfl-workspace/torch_unet_kvasir/requirements.txt',
'openfl-workspace_tf_cnn_histology': 'openfl-workspace/tf_cnn_histology/requirements.txt',
'openfl-workspace_tf_3dunet_brats': 'openfl-workspace/tf_3dunet_brats/requirements.txt',
'openfl-workspace_keras_cnn_with_compression': 'openfl-workspace/keras_cnn_with_compression/requirements.txt',
'openfl-workspace_keras_cnn_mnist': 'openfl-workspace/keras_cnn_mnist/requirements.txt',
// 'openfl-workspace_keras/histology': 'openfl-workspace/torch/histology/requirements.txt',
'openfl-workspace_keras/histology_src': 'openfl-workspace/torch/histology/src/requirements.txt',
'openfl-workspace_keras/nlp': 'openfl-workspace/keras/nlp/requirements.txt',
'openfl-workspace_torch_cnn_mnist': 'openfl-workspace/torch/mnist/requirements.txt',
'openfl-workspace_torch_unet_kvasir': 'openfl-workspace/torch/unet_kvasir/requirements.txt',
'openfl-workspace_keras_cnn_mnist': 'openfl-workspace/keras/mnist/requirements.txt',
'openfl-tutorials_interactive_api_pytorch_medmnist_2d_envoy': 'openfl-tutorials/interactive_api/PyTorch_MedMNIST_2D/envoy/requirements.txt',
'openfl-tutorials_interactive_api_pytorch_dogscats_vit_workspace': 'openfl-tutorials/interactive_api/PyTorch_DogsCats_ViT/workspace/requirements.txt',
'openfl-tutorials_interactive_api_pytorch_histology_envoy': 'openfl-tutorials/interactive_api/PyTorch_Histology/envoy/requirements.txt',
Expand Down
8 changes: 4 additions & 4 deletions docs/about/features_index/fed_eval.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ Example Using the Task Runner API (Aggregator-based Workflow)

The following steps can be leveraged to achieve practical e2e usage of FedEval

*N.B*: We will be using torch_cnn_mnist plan itself for both training and with some minor changes for evaluation as well
*N.B*: We will be using torch/mnist plan itself for both training and with some minor changes for evaluation as well

*Prerequisites*: Please ensure that OpenFL version==1.7 is installed or you can also choose to install latest from source.

Expand All @@ -48,13 +48,13 @@ With OpenFL version==1.7 aggregator start command is enhanced to have an optiona
--help Show this message and exit.

1. **Setup**
We will use the `torch_cnn_mnist` workspace for training
We will use the `torch/mnist` workspace for training

Let's first configure a workspace with all necesary certificates

.. code-block:: shell

fx workspace create --prefix ./cnn_train_eval --template torch_cnn_mnist
fx workspace create --prefix ./cnn_train_eval --template torch/mnist
cd cnn_train_eval
fx workspace certify
fx aggregator generate-cert-request
Expand Down Expand Up @@ -416,7 +416,7 @@ The updated plan post initialization with edits to make it ready for evaluation
metrics:
- loss

We have done following changes to the initialized torch_cnn_mnist plan in the new workspace:
We have done following changes to the initialized torch/mnist plan in the new workspace:
- Set the rounds_to_train to 1 as evaluation needs just one round of federation run across the collaborators
- Removed all other training related tasks from assigner settings except "aggregated_model_validation"
Now let's replace the ``init.pbuf`` with the previously saved ``trained_model.pbuf``
Expand Down
14 changes: 7 additions & 7 deletions docs/about/features_index/taskrunner.rst
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ Each YAML top-level section contains the following subsections:

The following is an example of a **plan.yaml**:

.. literalinclude:: ../../../openfl-workspace/torch_cnn_mnist/plan/plan.yaml
.. literalinclude:: ../../../openfl-workspace/torch/mnist/plan/plan.yaml
:language: yaml


Expand Down Expand Up @@ -150,22 +150,22 @@ STEP 1: Create a Workspace
$ fx


2. This example uses the :code:`keras_cnn_mnist` template.
2. This example uses the :code:`keras/mnist` template.

Set the environment variables to use the :code:`keras_cnn_mnist` as the template and :code:`${HOME}/my_federation` as the path to the workspace directory.
Set the environment variables to use the :code:`keras/mnist` as the template and :code:`${HOME}/my_federation` as the path to the workspace directory.

.. code-block:: shell

$ export WORKSPACE_TEMPLATE=keras_cnn_mnist
$ export WORKSPACE_TEMPLATE=keras/mnist
$ export WORKSPACE_PATH=${HOME}/my_federation

3. Decide a workspace template, which are end-to-end federated learning training demonstrations. The following is a sample of available templates:

- :code:`keras_cnn_mnist`: a workspace with a simple `Keras <http://keras.io/>`__ CNN model that will download the `MNIST <http://yann.lecun.com/exdb/mnist/>`_ dataset and train in a federation.
- :code:`keras/mnist`: a workspace with a simple `Keras <http://keras.io/>`__ CNN model that will download the `MNIST <http://yann.lecun.com/exdb/mnist/>`_ dataset and train in a federation.
- :code:`tf_2dunet`: a workspace with a simple `TensorFlow <http://tensorflow.org>`__ CNN model that will use the `BraTS <https://www.med.upenn.edu/sbia/brats2017/data.html>`_ dataset and train in a federation.
- :code:`tf_cnn_histology`: a workspace with a simple `TensorFlow <http://tensorflow.org>`__ CNN model that will download the `Colorectal Histology <https://zenodo.org/record/53169#.XGZemKwzbmG>`_ dataset and train in a federation.
- :code:`torch_cnn_histology`: a workspace with a simple `PyTorch <http://pytorch.org/>`__ CNN model that will download the `Colorectal Histology <https://zenodo.org/record/53169#.XGZemKwzbmG>`_ dataset and train in a federation.
- :code:`torch_cnn_mnist`: a workspace with a simple `PyTorch <http://pytorch.org>`__ CNN model that will download the `MNIST <http://yann.lecun.com/exdb/mnist/>`_ dataset and train in a federation.
- :code:`keras/histology`: a workspace with a simple `PyTorch <http://pytorch.org/>`__ CNN model that will download the `Colorectal Histology <https://zenodo.org/record/53169#.XGZemKwzbmG>`_ dataset and train in a federation.
- :code:`torch/mnist`: a workspace with a simple `PyTorch <http://pytorch.org>`__ CNN model that will download the `MNIST <http://yann.lecun.com/exdb/mnist/>`_ dataset and train in a federation.

See the complete list of available templates.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ For logging through Tensorboard, enable the parameter :code:`write_logs : true`
settings :
write_logs : true

Follow the steps below to write your custom callback function instead. As an example, a full implementation can be found at `Federated_Pytorch_MNIST_Tutorial.ipynb <https://github.com/intel/openfl/blob/develop/openfl-tutorials/Federated_Pytorch_MNIST_Tutorial.ipynb>`_ and in the **torch_cnn_mnist** workspace.
Follow the steps below to write your custom callback function instead. As an example, a full implementation can be found at `Federated_Pytorch_MNIST_Tutorial.ipynb <https://github.com/intel/openfl/blob/develop/openfl-tutorials/Federated_Pytorch_MNIST_Tutorial.ipynb>`_ and in the **torch/mnist** workspace.

1. Define the callback function, like how you defined in Python API, in the **src** directory in your workspace.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ The following are the straggler handling algorithms supported in OpenFL:
Demonstration of adding the straggler handling interface
=========================================================

The example template, **torch_cnn_mnist_straggler_check**, uses the ``PercentagePolicy``. To gain a better understanding of how experiments perform, you can modify the **percent_collaborators_needed** or **minimum_reporting** parameter in the template **plan.yaml** or even choose **CutoffTimePolicy** function instead:
The example template, **torch/mnist_straggler_check***, uses the ``PercentagePolicy``. To gain a better understanding of how experiments perform, you can modify the **percent_collaborators_needed** or **minimum_reporting** parameter in the template **plan.yaml** or even choose **CutoffTimePolicy** function instead:

.. code-block:: yaml

Expand Down
Loading