[MetricCollection/DDP] Get TypeError: _sync_dist() got an unexpected keyword argument 'process_group' when calling compute(). #438
Answered
by
SkafteNicki
Eleven1Liu
asked this question in
Q&A
-
Hi, I was trying to run MetricCollection
Belows are the python code and the error, is there any suggestions? EnvironmentCUDA 10.1
python 3.6
pytorch-lightning 1.3.5
torch 1.8.0
torchmetrics 0.4.1
Python
trainer = pl.Trainer(logger=False,
num_sanity_val_steps=0,
gpus=torch.cuda.device_count(),
accelerator='ddp',
progress_bar_refresh_rate=0,
max_epochs=args.epochs,
callbacks=[checkpoint_callback, earlystopping_callback])
class Model(pl.LightningModule):
...
def validation_epoch_end(self, step_outputs):
return self._shared_eval_epoch_end(step_outputs, 'val')
def _shared_eval_epoch_end(self, step_outputs, split):
# self.eval_metric is a MetricCollection
metric_dict = self.eval_metric.compute() # error starts from here
self.log_dict(metric_dict)
....
return metric_dict
... ErrorError starts from executing
|
Beta Was this translation helpful? Give feedback.
Answered by
SkafteNicki
Aug 10, 2021
Replies: 1 comment 4 replies
-
Hi @Eleven1Liu, |
Beta Was this translation helpful? Give feedback.
4 replies
Answer selected by
Eleven1Liu
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi @Eleven1Liu,
Would it be possible for you to send a fully reproduceable script?