securefederatedai · ishant162 · Feb 6, 2025 · Feb 6, 2025 · Feb 6, 2025 · Feb 8, 2025
@@ -18,14 +18,25 @@ A new OpenFL interface that gives significantly more flexility to researchers in
 There are several modifications we make in our reimagined version of this interface that are necessary for federated learning:
 
 1. *Placement*: Metaflow's :code:`@step` decorator is replaced by placement decorators that specify where a task will run. In horizontal federated learning, there are server (or aggregator) and client (or collaborator) nodes. Tasks decorated by :code:`@aggregator` will run on the aggregator node, and :code:`@collaborator` will run on the collaborator node. These placement decorators are interpreted by *Runtime* implementations: these do the heavy lifting of figuring out how to get the state of the current task to another process or node. 
-2. *Runtime*: Each flow has a :code:`.runtime` attribute. The runtime encapsulates the details of the infrastucture where the flow will run. We support the LocalRuntime for simulating experiments on local node and FederatedRuntime to launch experiments on distributed infrastructure.
+2. *Runtime*: The runtime encapsulates the details of the infrastucture where the flow will run. We support the LocalRuntime for simulating experiments on local node and FederatedRuntime to launch experiments on distributed infrastructure.
 3. *Conditional branches*: Perform different tasks if a criteria is met
 4. *Loops*: Internal loops are within a flow; this is necessary to support rounds of training where the same sequence of tasks is performed repeatedly.   
 
 How to use it?
 ==============
 
-Let's start with the basics. A flow is intended to define the entirety of federated learning experiment. Every flow begins with the :code:`start` task and concludes with the :code:`end` task. At each step in the flow, attributes can be defined, modified, or deleted. Attributes get passed forward to the next step in the flow, which is defined by the name of the task passed to the :code:`next` function. In the line before each task, there is a **placement decorator**. The placement decorator defines where that task will be run. The OpenFL Workflow Interface adopts the conventions set by Metaflow, that every workflow begins with start and concludes with the end task. In the following example, the aggregator begins with an optionally passed in model and optimizer. The aggregator begins the flow with the start task, where the list of collaborators is extracted from the runtime (:code:`self.collaborators = self.runtime.collaborators`) and is then used as the list of participants to run the task listed in self.next, aggregated_model_validation. The model, optimizer, and anything that is not explicitly excluded from the next function will be passed from the start function on the aggregator to the aggregated_model_validation task on the collaborator. Where the tasks run is determined by the placement decorator that precedes each task definition (:code:`@aggregator` or :code:`@collaborator`). Once each of the collaborators (defined in the runtime) complete the aggregated_model_validation task, they pass their current state onto the train task, from train to local_model_validation, and then finally to join at the aggregator. It is in join that an average is taken of the model weights, and the next round can begin. 
+Let's start with the basics. A flow is intended to define the entirety of federated learning experiment. Every flow begins with the :code:`start` task and concludes with the
+:code:`end` task. At each step in the flow, attributes can be defined, modified, or deleted. Attributes get passed forward to the next step in the flow, which is defined by
+the name of the task passed to the :code:`next` function.
+In the line before each task, there is a **placement decorator**. The placement decorator defines where that task will be run (:code:`@aggregator` or :code:`@collaborator`).
+The OpenFL Workflow Interface adopts the conventions set by Metaflow, that every workflow begins with start andconcludes with the end task. In the following example, the
+aggregator begins the flow with :code:`start` task and optionally passed in model and optimizer. The list of collaborators in the federation, :code:`self.collaborators`,
+is automatically populated by the Runtime infrastructure. It serves as the participant list for executing tasks listed in :code:`self.next` and :code:`aggregated_model_validation`.
+The model, optimizer, and anything that is not explicitly excluded from the next function will be passed from the start function on the aggregator to the
+aggregated_model_validation task on the collaborator.
+Once each of the collaborators (defined in the runtime) complete the :code:`aggregated_model_validation` task, they
+pass their current state onto the :code:`train` task, from :code:`train` to :code:`local_model_validation`, and then finally to :code:`join` at the aggregator.
+It is in :code:`join` that an average is taken of the model weights, and the next round can begin.
 
 .. code-block:: python
 
@@ -45,9 +56,9 @@ Let's start with the basics. A flow is intended to define the entirety of federa
         @aggregator
         def start(self):
             print(f'Performing initialization for model')
-            self.collaborators = self.runtime.collaborators
             self.private = 10
             self.current_round = 0
+            print(f'Collaborators participating in federation: {self.collaborators}')
             self.next(self.aggregated_model_validation,foreach='collaborators',exclude=['private'])
 
         @collaborator
@@ -237,20 +248,19 @@ Some important points to remember while creating callback function and private a
     - In above example multiple collaborators have the same callback function or private attributes. Depending on the Federated Learning requirements, user can specify unique callback function or private attributes for each Participant
     - *Private attributes* needs to be set after instantiating the participant.
 
-Now let's see how the runtime for a flow is assigned, and the flow gets run:
+To run the flow, simply pass the instance of the flow to the :code:`run()` method of runtime:
 
 .. code-block:: python
 
     flow = FederatedFlow()
-    flow.runtime = local_runtime
-    flow.run()
+    local_runtime.run(flow)
 
 And that's it! This will run an instance of the :code:`FederatedFlow` on a single node in a single process. 
 
 LocalRuntime Backends
 ---------------------
 
-The Runtime defines where code will run, but the Runtime has a :code:`Backend` - which defines the underlying implementation of *how* the flow will be executed. :code:`single_process` is the default in the :code:`LocalRuntime`: it executes all code sequentially within a single python process, and is well suited to run both on high spec and low spec hardware
+The Runtime defines where code will run, but the Runtime has a :code:`backend` - which defines the underlying implementation of *how* the flow will be executed. :code:`single_process` is the default in the :code:`LocalRuntime`: it executes all code sequentially within a single python process, and is well suited to run both on high spec and low spec hardware
 
 For users with large servers or multiple GPUs they wish to take advantage of, we also provide a :code:`ray` `<https://github.com/ray-project/ray>` backend. The Ray backend enables parallel task execution for collaborators, and optionally allows users to request dedicated CPU / GPUs for Participants by using the :code:`num_cpus` and :code:`num_gpus` arguments while instantiating the Participant in following manner:
 
@@ -428,13 +438,12 @@ Below is an example of how to set up and instantiate a :code:`FederatedRuntime`:
        tls=False
    )
 
-To distribute the experiment on the Federation, we now need to assign the federated_runtime to the flow and execute it.
+To distribute the experiment on the Federation, we simply pass the instance of flow to :code:`run()` method of :code:`FederatedRuntime`
 
 .. code-block:: python
 
     flow = FederatedFlow()
-    flow.runtime = federated_runtime
-    flow.run()
+    federated_runtime.run(flow)
 
 This will export the Jupyter notebook to an workspace and deploy it to the federation. The Director receives the experiment, distributes it to the Envoys, and initiates the execution of the experiment.
 

@@ -224,7 +224,12 @@
     "scrolled": true
    },
    "source": [
-    "Now we come to the flow definition. The OpenFL Workflow Interface adopts the conventions set by Metaflow, that every workflow begins with `start` and concludes with the `end` task. The aggregator begins with an optionally passed in model and optimizer. The aggregator begins the flow with the `start` task, where the list of collaborators is extracted from the runtime (`self.collaborators = self.runtime.collaborators`) and is then used as the list of participants to run the task listed in `self.next`, `aggregated_model_validation`. The model, optimizer, and anything that is not explicitly excluded from the next function will be passed from the `start` function on the aggregator to the `aggregated_model_validation` task on the collaborator. Where the tasks run is determined by the placement decorator that precedes each task definition (`@aggregator` or `@collaborator`). Once each of the collaborators (defined in the runtime) complete the `aggregated_model_validation` task, they pass their current state onto the `train` task, from `train` to `local_model_validation`, and then finally to `join` at the aggregator. It is in `join` that an average is taken of the model weights, and the next round can begin.\n",
+    "Now we come to the flow definition. The OpenFL Workflow Interface adopts the conventions set by Metaflow, that every workflow begins with `start` and concludes with the `end` task. Task placement (i.e. where the tasks run) is determined by the placement decorator that precedes each task definition (`@aggregator` or `@collaborator`)\n",
+    "\n",
+    "The aggregator begins the flow with `start` task and optionally passed in model and optimizer. The list of collaborators in federation (`self.collaborators`) is automatically populated by LocalRuntime infrastructure and is then used as the list of participants to run the task listed in `self.next`, `aggregated_model_validation`. The model, optimizer, and anything that is not explicitly excluded from the next function will be passed from the `start` function on the aggregator to the `aggregated_model_validation` task on the collaborator\n",
+    "\n",
+    "Once each of the collaborators (defined in the runtime) complete the `aggregated_model_validation` task, they pass their current state onto the `train` task, from `train` to `local_model_validation`, and then finally to `join` at the aggregator. It is in `join` that an average is taken of the model weights, and the next round can begin.\n",
+    "\n",
     "\n",
     "![image.png](attachment:image.png)"
    ]
@@ -252,9 +257,9 @@
     "    @aggregator\n",
     "    def start(self):\n",
     "        print(f'Performing initialization for model')\n",
-    "        self.collaborators = self.runtime.collaborators\n",
     "        self.private = 10\n",
     "        self.current_round = 0\n",
+    "        print(f'Collaborators participating in federation: {self.collaborators}')\n",
     "        self.next(self.aggregated_model_validation, foreach='collaborators', exclude=['private'])\n",
     "\n",
     "    @collaborator\n",
@@ -382,8 +387,7 @@
     "best_model = None\n",
     "optimizer = None\n",
     "flflow = FederatedFlow(model, optimizer, rounds=2, checkpoint=True)\n",
-    "flflow.runtime = local_runtime\n",
-    "flflow.run()"
+    "local_runtime.run(flflow)"
    ]
   },
   {
@@ -425,8 +429,7 @@
    "outputs": [],
    "source": [
     "flflow2 = FederatedFlow(model=flflow.model, optimizer=flflow.optimizer, rounds=2, checkpoint=True)\n",
-    "flflow2.runtime = local_runtime\n",
-    "flflow2.run()"
+    "local_runtime.run(flflow2)"
    ]
   },
   {

@@ -339,9 +339,8 @@
     "        \"\"\"\n",
     "        print(f\"Initializing Workflow .... \")\n",
     "\n",
-    "        self.collaborators = self.runtime.collaborators\n",
     "        self.current_round = 0\n",
-    "\n",
+    "        print(f'Collaborators participating in federation: {self.collaborators}')\n",
     "        self.next(self.aggregated_model_validation, foreach=\"collaborators\")\n",
     "\n",
     "    @collaborator\n",
@@ -521,8 +520,7 @@
     "model = None\n",
     "optimizer = None\n",
     "flflow = FederatedFlow_TorchMNIST(model, optimizer, learning_rate, momentum, rounds=2, checkpoint=True)\n",
-    "flflow.runtime = local_runtime\n",
-    "flflow.run()"
+    "local_runtime.run(flflow)"
    ]
   },
   {
@@ -635,7 +633,7 @@
    "id": "87c487cb",
    "metadata": {},
    "source": [
-    "Now that we have our distributed infrastructure ready, let us modify the flow runtime to `FederatedRuntime` instance and deploy the experiment. \n",
+    "Now that we have our distributed infrastructure ready, the experiment is deployed onto the federation by providing the same `flflow` instance to `FederatedRuntime`.\n",
     "\n",
     "Progress of the flow is available on \n",
     "1. Jupyter notebook: if `checkpoint` attribute of the flow object is set to `True`\n",
@@ -650,8 +648,7 @@
    "outputs": [],
    "source": [
     "flflow.results = [] # clear results from previous run\n",
-    "flflow.runtime = federated_runtime\n",
-    "flflow.run()"
+    "federated_runtime.run(flflow)"
    ]
   },
   {

@@ -279,8 +279,7 @@
     "        This is the start of the Flow.\n",
     "        \"\"\"\n",
     "        print(\"<Agg>: Start of flow ... \")\n",
-    "        self.collaborators = self.runtime.collaborators\n",
-    "\n",
+    "        print(f'Collaborators participating in federation: {self.collaborators}')\n",
     "        self.next(self.watermark_pretrain)\n",
     "\n",
     "    @aggregator\n",
@@ -558,8 +557,7 @@
     "    watermark_retrain_optimizer,\n",
     "    checkpoint=True,\n",
     ")\n",
-    "flflow.runtime = federated_runtime\n",
-    "flflow.run()"
+    "federated_runtime.run(flflow)"
    ]
   }
  ],

@@ -15,9 +15,7 @@
 import dill
 
 from openfl.experimental.workflow.interface import FLSpec
-from openfl.experimental.workflow.runtime import FederatedRuntime
 from openfl.experimental.workflow.utilities import aggregator_to_collaborator, checkpoint
-from openfl.experimental.workflow.utilities.metaflow_utils import MetaflowInterface
 
 logger = getLogger(__name__)
 
@@ -125,13 +123,7 @@ def __init__(
 
         self.flow = flow
         self.checkpoint = checkpoint
-        self.flow._foreach_methods = []
-        logger.info("MetaflowInterface creation.")
-        self.flow._metaflow_interface = MetaflowInterface(self.flow.__class__, "single_process")
-        self.flow._run_id = self.flow._metaflow_interface.create_run()
-        self.flow.runtime = FederatedRuntime()
         self.name = "aggregator"
-        self.flow.runtime.collaborators = self.authorized_cols
 
         self.__private_attrs_callable = private_attributes_callable
         self.__private_attrs = private_attributes
@@ -200,10 +192,8 @@ async def run_flow(self) -> FLSpec:
         """
         # Start function will be the first step if any flow
         f_name = "start"
-        # Creating a clones from the flow object
-        FLSpec._reset_clones()
-        FLSpec._create_clones(self.flow, self.flow.runtime.collaborators)
-
+        # Initialize the flow state
+        self.flow.initialize_flow_state(self.authorized_cols)
         logger.info(f"Starting round {self.current_round}...")
         while True:
             next_step = self.do_task(f_name)