Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA out of memory when testing main S3DIS dataset (segmentation) using poinnet.yaml #147

Open
SantiDiazC opened this issue Jan 22, 2024 · 0 comments

Comments

@SantiDiazC
Copy link

SantiDiazC commented Jan 22, 2024

Hi, Thanks for your work.

I am testing the library so I run the training using the poinnet.yaml on the S3DIS dataset (segmentation). The training went well for 100 epochs using a batch_size=2 on a RTX 3080. however, when the testing part started I found the following error:

[01/20 04:16:41 S3DIS]: Test [5]/[68] cloud
Test on 5-th cloud [20]/[72]]:  28%|████████████████████████████████████████████▍                                                                                                                   | 20/72 [00:02<00:05,  9.00it/s]
Traceback (most recent call last):
  File "examples/segmentation/main.py", line 745, in <module>
    main(0, cfg)
  File "examples/segmentation/main.py", line 308, in main
    test_miou, test_macc, test_oa, test_ious, test_accs, _ = test(model, data_list, cfg)
  File "/home/hri-david/anaconda3/envs/openpoints/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
    return func(*args, **kwargs)
  File "examples/segmentation/main.py", line 598, in test
    logits = model(data)
  File "/home/hri-david/anaconda3/envs/openpoints/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/hri-david/PycharmProjects/Pointnet/PointNeXt/examples/segmentation/../../openpoints/models/segmentation/base_seg.py", line 45, in forward
    p, f = self.encoder.forward_seg_feat(data)
  File "/home/hri-david/PycharmProjects/Pointnet/PointNeXt/examples/segmentation/../../openpoints/models/backbone/pointnet.py", line 170, in forward_seg_feat
    trans = self.stn(x)
  File "/home/hri-david/anaconda3/envs/openpoints/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/hri-david/PycharmProjects/Pointnet/PointNeXt/examples/segmentation/../../openpoints/models/backbone/pointnet.py", line 36, in forward
    x = F.relu(self.bn3(self.conv3(x)))
  File "/home/hri-david/anaconda3/envs/openpoints/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/hri-david/anaconda3/envs/openpoints/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 179, in forward
    self.eps,
  File "/home/hri-david/anaconda3/envs/openpoints/lib/python3.7/site-packages/torch/nn/functional.py", line 2283, in batch_norm
    input, weight, bias, running_mean, running_var, training, momentum, eps, torch.backends.cudnn.enabled
RuntimeError: CUDA out of memory. Tried to allocate 876.00 MiB (GPU 0; 9.74 GiB total capacity; 1.28 GiB already allocated; 121.19 MiB free; 3.16 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
wandb: | 2.905 MB of 2.905 MB uploaded
wandb: Run history:
wandb:       best_val ▁▂▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▇██████████████
wandb:    global_step ▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
wandb:             lr ████████▇▇▇▇▇▆▆▆▆▅▅▅▄▄▄▄▃▃▃▃▂▂▂▂▂▁▁▁▁▁▁▁
wandb: macc_when_best ▁▂▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▇▇▇▇▇▇▇▇███████████████
wandb:   oa_when_best ▁▁███████████████▆▆▇▇▇▇▇▇▇██████████████
wandb:     train_loss █▅▅▄▄▄▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
wandb:     train_macc ▁▃▄▄▅▅▅▅▆▆▆▆▆▆▆▇▇▇▇▇▇▇▇▇▇███████████████
wandb:     train_miou ▁▃▄▄▅▅▅▅▆▆▆▆▆▆▇▇▇▇▇▇▇▇▇▇▇███████████████
wandb:       val_macc ▃▃▇▄▅▆▃▄▇▆▅▆▄▄▇▆▆▇▆▇▅▁▇▁▆▄█▇▇▆▇▃▇▇▆▆▇▇▅▄
wandb:       val_miou ▄▃▇▄▆▆▃▄▇▆▆▅▃▄▇▅▅▇▅▆▄▂▇▁▆▃█▇▆▅▆▃▇▆▅▆▇▆▅▃
wandb:         val_oa ▆▅█▄▇▇▅▆▇▇▇▆▅▅▇▆▆▇▆▆▅▂▇▁▇▃█▇▇▅▇▃▇▇▆▆▇▇▆▃
wandb: 
wandb: Run summary:
wandb:       best_val 22.63091
wandb:    global_step 100
wandb:             lr 1e-05
wandb: macc_when_best 29.38019
wandb:   oa_when_best 61.35135
wandb:     train_loss 1.55627
wandb:     train_macc 42.63173
wandb:     train_miou 34.23775
wandb:       val_macc 20.69266
wandb:       val_miou 12.51122
wandb:         val_oa 41.35226
wandb: 
wandb: 🚀 View run s3dis-train-pointnet-ngpus1-20240119-195032-Y9EAMrwTdiBMMf9hkLf8 at: https://wandb.ai/dsdiazc/PointNeXt-S3DIS/runs/5cx3w4ln
wandb: ️⚡ View job at https://wandb.ai/dsdiazc/PointNeXt-S3DIS/jobs/QXJ0aWZhY3RDb2xsZWN0aW9uOjEzMTk0MzY1NQ==/version_details/v0
wandb: Synced 6 W&B file(s), 0 media file(s), 2 artifact file(s) and 2 other file(s)
wandb: Find logs at: ./wandb/run-20240119_195033-5cx3w4ln/logs

Should I do some additional modification to the yaml file to make it work on my hardware (RTX 3080)?

Thank You in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant