CUDA out of memory #14

xczhusuda · 2024-10-29T07:12:31Z

您好，我们训练使用SimAM_ResNet34_ASP系列模型遇到显存不足的问题
设置参数如下：
batch_size: 2
spk_model: SimAM_ResNet34_ASP
spk_model_init: ./wespeaker_models/voxblink2_samresnet34_ft/avg_model.pt
tse_model: BSRNN
训练1600step后报错如下:
File "..../speech_separation/tse/wesep/wesep/models/bsrnn.py", line 41, in forward
rnn_output, _ = self.rnn(self.norm(input).transpose(1, 2).contiguous())
File "/opt/conda/envs/py310torch201/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/py310torch201/lib/python3.10/site-packages/torch/nn/modules/rnn.py", line 812, in forward
result = _VF.lstm(input, hx, self._flat_weights, self.bias, self.num_layers,
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 608.00 MiB (GPU 1; 23.69 GiB total capacity; 21.86 GiB already allocated; 334.94 MiB free; 22.95 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 173992 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 173994 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 173995 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 173996 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 173997 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 173998 closing signal SIGTERM

我们将batch_size设置为1以及使用torch.cuda.empty_cache()还是会遇到这样的问题

为什么会出现这样的情况呢？

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA out of memory #14

CUDA out of memory #14

xczhusuda commented Oct 29, 2024 •

edited

Loading

CUDA out of memory #14

CUDA out of memory #14

Comments

xczhusuda commented Oct 29, 2024 • edited Loading

xczhusuda commented Oct 29, 2024 •

edited

Loading