You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
您好,我们训练使用SimAM_ResNet34_ASP系列模型遇到显存不足的问题
设置参数如下:
batch_size: 2
spk_model: SimAM_ResNet34_ASP
spk_model_init: ./wespeaker_models/voxblink2_samresnet34_ft/avg_model.pt
tse_model: BSRNN
训练1600step后报错如下:
File "..../speech_separation/tse/wesep/wesep/models/bsrnn.py", line 41, in forward
rnn_output, _ = self.rnn(self.norm(input).transpose(1, 2).contiguous())
File "/opt/conda/envs/py310torch201/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/py310torch201/lib/python3.10/site-packages/torch/nn/modules/rnn.py", line 812, in forward
result = _VF.lstm(input, hx, self._flat_weights, self.bias, self.num_layers,
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 608.00 MiB (GPU 1; 23.69 GiB total capacity; 21.86 GiB already allocated; 334.94 MiB free; 22.95 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 173992 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 173994 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 173995 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 173996 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 173997 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 173998 closing signal SIGTERM
您好,我们训练使用SimAM_ResNet34_ASP系列模型遇到显存不足的问题
设置参数如下:
batch_size: 2
spk_model: SimAM_ResNet34_ASP
spk_model_init: ./wespeaker_models/voxblink2_samresnet34_ft/avg_model.pt
tse_model: BSRNN
训练1600step后报错如下:
File "..../speech_separation/tse/wesep/wesep/models/bsrnn.py", line 41, in forward
rnn_output, _ = self.rnn(self.norm(input).transpose(1, 2).contiguous())
File "/opt/conda/envs/py310torch201/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/py310torch201/lib/python3.10/site-packages/torch/nn/modules/rnn.py", line 812, in forward
result = _VF.lstm(input, hx, self._flat_weights, self.bias, self.num_layers,
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 608.00 MiB (GPU 1; 23.69 GiB total capacity; 21.86 GiB already allocated; 334.94 MiB free; 22.95 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 173992 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 173994 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 173995 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 173996 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 173997 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 173998 closing signal SIGTERM
我们将batch_size设置为1以及使用torch.cuda.empty_cache()还是会遇到这样的问题
为什么会出现这样的情况呢?
The text was updated successfully, but these errors were encountered: