Skip to content

Commit

Permalink
Show file tree
Hide file tree
Showing 6 changed files with 299 additions and 191 deletions.
9 changes: 9 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,3 +38,12 @@ about the new command, please refer to the [README](README.md).
- 🌟 Abstract classes for new models/dataloaders.
- 🌟 Allows Federated Learning with Personalization.
- Personalization allows you to leverage each client local data to obtain models that are better adjusted to their own data distribution. You can run the `cv` task in order to try out this feature.


## [1.0.1] - 2023-07-29

🔋 This release removes the restriction of the minimum number of GPUs available in FLUTE,
allowing users to run experiments using a single-GPU worker by instantiating both: Server
and clients on the same device. For more documentation about how to run an experiments
using a single GPU, please refer to the [README](README.md).

12 changes: 10 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Welcome to FLUTE (Federated Learning Utilities for Testing and Experimentation),
FLUTE is a pytorch-based orchestration environment enabling GPU or CPU-based FL simulations. The primary goal of FLUTE is to enable researchers to rapidly prototype and validate their ideas. Features include:

- large scale simulation (millions of clients, sampling tens of thousands per round)
- multi-GPU and multi-node orchestration
- single/multi GPU and multi-node orchestration
- local or global differential privacy
- model quantization
- a variety of standard optimizers and aggregation methods
Expand Down Expand Up @@ -74,11 +74,19 @@ FLUTE uses torch.distributed API as its main communication backbone, supporting

After this initial setup, you can use the data created for the integration test for a first local run. Note that this data needs to be download manually inside the `testing` folder, for more instructions please look at [the README file inside `testing`](testing/README.md).

For single-GPU runs:

```
python -m torch.distributed.run --nproc_per_node=1 e2e_trainer.py -dataPath ./testing -outputPath scratch -config testing/hello_world_nlg_gru.yaml -task nlg_gru -backend nccl
```

For multi-GPU runs (3 GPUs):

```
python -m torch.distributed.run --nproc_per_node=3 e2e_trainer.py -dataPath ./testing -outputPath scratch -config testing/hello_world_nlg_gru.yaml -task nlg_gru -backend nccl
```

This config uses 1 node with 3 workers (1 server, 2 clients). The config file `testing/hello_world_nlg_gru.yaml` has some comments explaining the major sections and some important details; essentially, it consists in a very short experiment where a couple of iterations are done for just a few clients. A `scratch` folder will be created containing detailed logs.
The config file `testing/hello_world_nlg_gru.yaml` has some comments explaining the major sections and some important details; essentially, it consists in a very short experiment where a couple of iterations are done for just a few clients. A `scratch` folder will be created containing detailed logs.

## Documentation

Expand Down
7 changes: 4 additions & 3 deletions core/evaluation.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@

class Evaluation():

def __init__(self, config, model_path, process_testvalidate, idx_val_clients, idx_test_clients):
def __init__(self, config, model_path, process_testvalidate, idx_val_clients, idx_test_clients, single_worker):

self.config = config
self.model_path = model_path
Expand All @@ -29,6 +29,7 @@ def __init__(self, config, model_path, process_testvalidate, idx_val_clients, id
self.idx_val_clients = idx_val_clients
self.idx_test_clients = idx_test_clients
self.send_dicts = config['server_config'].get('send_dicts', False)
self.single_worker = single_worker
super().__init__()

def run(self, eval_list, req, metric_logger=None):
Expand Down Expand Up @@ -155,7 +156,7 @@ def run_distributed_evaluation(self, mode, clients, model):
total = 0
self.logits = {'predictions': [], 'probabilities': [], 'labels': []}
server_data = (0.0, model, 0)
for result in self.process_testvalidate(clients, server_data, mode):
for result in self.process_testvalidate(clients, server_data, mode, self.single_worker):
output, metrics, count = result
val_metrics = {key: {'value':0, 'higher_is_better': False} for key in metrics.keys()} if total == 0 else val_metrics

Expand Down Expand Up @@ -190,7 +191,7 @@ def make_eval_clients(dataset, config):
'''

total = sum(dataset.num_samples)
clients = federated.size() - 1
clients = federated.size() - 1 if federated.size()>1 else federated.size()
delta = total / clients + 1
threshold = delta
current_users_idxs = list()
Expand Down
Loading

0 comments on commit 43e1530

Please sign in to comment.