36gb minimum GPU memory is required using batch size 1 and fp16 mixed precision training? #61

danielvegamyhre · 2024-07-16T05:55:47Z

It seems 16GB GPU memory is not enough, I get CUDA out of memory error immediately, and can see in Colab resource usage GPU memory spike to max before crashing.

So to estimate how much GPU vRAM would be required, I first summed total model params for CLIPVisionModelWithProjection, AutoencoderKLTemporalDecoder, and UNetSpatioTemporalConditionModel:

total params: 2254442729

Next, I multiply the model params by (2 + 2 + 12). These numbers are from:

2 bytes for fp16 copy of model params (used in fp16 mixed precision training)
2 bytes for fp16 model gradients (used in fp16 mixed precision training),
12 bytes for optimizer state (4 bytes for each fp32 parameter, momentum and variance)

Multiplying this out, I get 2254442729 * (2 + 2 + 12) = 36071083664 bytes, which is ~36Gb of GPU memory required to fine tune using batch size of 1 and fp16 mixed precision training.

Is this accurate?

The text was updated successfully, but these errors were encountered:

christopher-beckham · 2024-07-16T22:15:37Z

Not the repo author but I also had some concerns related, see my comment and the one below here: #31 (comment)

For starters, the entire unet is stored as fp32 in the script, since for some reason this cast is commented out:

https://github.com/pixeli99/SVD_Xtend/blob/main/train_svd.py#L730

Also the number of frames it defaults to training on is 25, which can really blow up your GPU memory.

danielvegamyhre · 2024-07-17T02:58:44Z

Not the repo author but I also had some concerns related, see my comment and the one below here: #31 (comment)

For starters, the entire unet is stored as fp32 in the script, since for some reason this cast is commented out:

https://github.com/pixeli99/SVD_Xtend/blob/main/train_svd.py#L730

Also the number of frames it defaults to training on is 25, which can really blow up your GPU memory.

Thanks, this is helpful. I replied in the thread you link with some follow up questions.

pixeli99 · 2024-07-23T11:40:16Z

I'm sorry, at the beginning of writing this code, I was more focused on supporting SVD training and didn't consider the memory issues much. This has caused some inconvenience to everyone. As @christopher-beckham mentioned, this line of code should not have been commented out. I have fixed this issue.

KhaledButainy · 2024-07-23T19:41:55Z

I'm sorry, at the beginning of writing this code, I was more focused on supporting SVD training and didn't consider the memory issues much. This has caused some inconvenience to everyone. As @christopher-beckham mentioned, this line of code should not have been commented out. I have fixed this issue.

I removed the comment on the following line:
https://github.com/pixeli99/SVD_Xtend/blob/main/train_svd.py#L739

Is this the correct line? What other lines did you modify?

Thank you in advance.

christopher-beckham · 2024-07-23T19:45:22Z

You can use the following to cast everything to fp16 except the trainable params.

    unet.requires_grad_(True)
    parameters_list = []

    # Customize the parameters that need to be trained; if necessary, you can uncomment them yourself.
    for name, para in unet.named_parameters():
        if 'temporal_transformer_block' in name:
            parameters_list.append(para)
            para.requires_grad = True
            para.data = para.data.to(dtype=torch.float32)
        else:
            para.requires_grad = False

KhaledButainy · 2024-07-23T19:57:22Z

You can use the following to cast everything to fp16 except the trainable params.

    unet.requires_grad_(True)
    parameters_list = []

    # Customize the parameters that need to be trained; if necessary, you can uncomment them yourself.
    for name, para in unet.named_parameters():
        if 'temporal_transformer_block' in name:
            parameters_list.append(para)
            para.requires_grad = True
            para.data = para.data.to(dtype=torch.float32)
        else:
            para.requires_grad = False

Thank you for sharing.

What do you think about keeping this line commented:
https://github.com/pixeli99/SVD_Xtend/blob/main/train_svd.py#L739

and cast only the frozen parameters:

    unet.requires_grad_(True)
    parameters_list = []

    # Customize the parameters that need to be trained; if necessary, you can uncomment them yourself.
    for name, para in unet.named_parameters():
        if 'temporal_transformer_block' in name:
            parameters_list.append(para)
            para.requires_grad = True
        else:
            para.requires_grad = False
            para.data = para.data.to(dtype=weight_dtype) # torch.float16

This way we don't lose the model precision by downcasting to float16 and upcasting to float32 again.

christopher-beckham · 2024-07-23T20:13:25Z

No. You don't want the untrained params in f32. You're trying to save memory on the gpu.

…

On Tue, Jul 23, 2024 at 15:57 Khaled ***@***.***> wrote: You can use the following to cast everything to fp16 except the trainable params. unet.requires_grad_(True) parameters_list = [] # Customize the parameters that need to be trained; if necessary, you can uncomment them yourself. for name, para in unet.named_parameters(): if 'temporal_transformer_block' in name: parameters_list.append(para) para.requires_grad = True para.data = para.data.to(dtype=torch.float32) else: para.requires_grad = False Thank you for sharing. What do you think about keeping this line commented: https://github.com/pixeli99/SVD_Xtend/blob/main/train_svd.py#L739 and cast only the frozen parameters: ''' unet.requires_grad_(True) parameters_list = [] # Customize the parameters that need to be trained; if necessary, you can uncomment them yourself. for name, para in unet.named_parameters(): if 'temporal_transformer_block' in name: parameters_list.append(para) para.requires_grad = True else: para.requires_grad = False para.data = para.data.to(dtype=weight_dtype) # torch.float16 ''' This way we don't lose the model precision by downcasting to float16 and upcasting to float32 again. — Reply to this email directly, view it on GitHub <#61 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AASOJADFRGCETOVS6B5FYULZN2YTRAVCNFSM6AAAAABK53FMI2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENBWGE4DINBRGY> . You are receiving this because you were mentioned.Message ID: ***@***.***>

KhaledButainy · 2024-07-23T20:36:48Z

You can also enable --gradient-checkpointing to save more GPU memory. However, this might result in slower training.

KhaledButainy · 2024-07-23T20:44:58Z

No. You don't want the untrained params in f32. You're trying to save memory on the GPU.

We are not doing that. Untrained params will be in f16, and only trained params are in f32 as you suggested.

In my comment, I suggested keeping the model in f32 and only downcasting the frozen params to f16 instead of downcasting the full model to f16 and then upcasting the training params to f32.

Both will save the same GPU memory, but you will lose some precision on the training params following the second approach.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

36gb minimum GPU memory is required using batch size 1 and fp16 mixed precision training? #61

36gb minimum GPU memory is required using batch size 1 and fp16 mixed precision training? #61

danielvegamyhre commented Jul 16, 2024

christopher-beckham commented Jul 16, 2024

danielvegamyhre commented Jul 17, 2024

pixeli99 commented Jul 23, 2024

KhaledButainy commented Jul 23, 2024

christopher-beckham commented Jul 23, 2024

KhaledButainy commented Jul 23, 2024 •

edited

Loading

christopher-beckham commented Jul 23, 2024 via email

KhaledButainy commented Jul 23, 2024

KhaledButainy commented Jul 23, 2024 •

edited

Loading

36gb minimum GPU memory is required using batch size 1 and fp16 mixed precision training? #61

36gb minimum GPU memory is required using batch size 1 and fp16 mixed precision training? #61

Comments

danielvegamyhre commented Jul 16, 2024

christopher-beckham commented Jul 16, 2024

danielvegamyhre commented Jul 17, 2024

pixeli99 commented Jul 23, 2024

KhaledButainy commented Jul 23, 2024

christopher-beckham commented Jul 23, 2024

KhaledButainy commented Jul 23, 2024 • edited Loading

christopher-beckham commented Jul 23, 2024 via email

KhaledButainy commented Jul 23, 2024

KhaledButainy commented Jul 23, 2024 • edited Loading

KhaledButainy commented Jul 23, 2024 •

edited

Loading

KhaledButainy commented Jul 23, 2024 •

edited

Loading