Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cuda out of memory error #1

Open
benam2 opened this issue Jul 20, 2023 · 2 comments
Open

cuda out of memory error #1

benam2 opened this issue Jul 20, 2023 · 2 comments

Comments

@benam2
Copy link

benam2 commented Jul 20, 2023

Hey, thanks for sharing your code!

I'm trying to fine-tune the model on my dataset, but I keep getting an out-of-memory error, even though I have 24 gigabytes of RAM. Any ideas on what might be causing this problem?

Thank You!

@georgesung
Copy link
Owner

georgesung commented Jul 20, 2023

Are you using python 3.7? I noticed with the latest release of transformers and peft there are incompatibility issues if you use python 3.7. I updated the README with a troubleshooting section to address this issue, copied below. You can either use the work-around below, or use python 3.8+

Issues with python 3.7

If you're using python 3.7, you will install transformers 4.30.x, since transformers >=4.31.0 no longer supports python 3.7. If you then install the latest version of peft, the GPU memory consumption will be higher than usual. The work-around is to use an older version of peft to go along with the older transformers version you installed. Update your requirements.txt as follows:

transformers==4.30.2
git+https://github.com/huggingface/peft.git@86290e9660d24ef0d0cedcf57710da249dd1f2f4

Of course, make sure to remove the original lines with transformers and peft, and run pip install -r requirements.txt

@bhuvneshsaini
Copy link

I am doing fine-tune on kaggle and using python3.10 and used transformers==4.30.2
git+https://github.com/huggingface/peft.git@86290e9660d24ef0d0cedcf57710da249dd1f2f4

But still not able to start the training

Unexpected exception formatting exception. Falling back to standard exception
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 3508, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "/tmp/ipykernel_42/3053083764.py", line 29, in
trainer.load_base_model()
File "/kaggle/input/llama-finetune1/llm_qlora-main/QloraTrainer.py", line 34, in load_base_model
model = LlamaForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map={"":0})
File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3694, in from_pretrained
Positions of the first token for the labeled span.
File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4104, in _load_pretrained_model
File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 786, in _load_state_dict_into_meta_model
File "/opt/conda/lib/python3.10/site-packages/transformers/integrations/bitsandbytes.py", line 108, in set_module_quantized_tensor_to_device
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 2105, in showtraceback
stb = self.InteractiveTB.structured_traceback(
File "/opt/conda/lib/python3.10/site-packages/IPython/core/ultratb.py", line 1428, in structured_traceback
return FormattedTB.structured_traceback(
File "/opt/conda/lib/python3.10/site-packages/IPython/core/ultratb.py", line 1319, in structured_traceback
return VerboseTB.structured_traceback(
File "/opt/conda/lib/python3.10/site-packages/IPython/core/ultratb.py", line 1172, in structured_traceback
formatted_exception = self.format_exception_as_a_whole(etype, evalue, etb, number_of_lines_of_context,
File "/opt/conda/lib/python3.10/site-packages/IPython/core/ultratb.py", line 1087, in format_exception_as_a_whole
frames.append(self.format_record(record))
File "/opt/conda/lib/python3.10/site-packages/IPython/core/ultratb.py", line 969, in format_record
frame_info.lines, Colors, self.has_colors, lvals
File "/opt/conda/lib/python3.10/site-packages/IPython/core/ultratb.py", line 792, in lines
return self._sd.lines
File "/opt/conda/lib/python3.10/site-packages/stack_data/utils.py", line 144, in cached_property_wrapper
value = obj.dict[self.func.name] = self.func(obj)
File "/opt/conda/lib/python3.10/site-packages/stack_data/core.py", line 734, in lines
pieces = self.included_pieces
File "/opt/conda/lib/python3.10/site-packages/stack_data/utils.py", line 144, in cached_property_wrapper
value = obj.dict[self.func.name] = self.func(obj)
File "/opt/conda/lib/python3.10/site-packages/stack_data/core.py", line 681, in included_pieces
pos = scope_pieces.index(self.executing_piece)
File "/opt/conda/lib/python3.10/site-packages/stack_data/utils.py", line 144, in cached_property_wrapper
value = obj.dict[self.func.name] = self.func(obj)
File "/opt/conda/lib/python3.10/site-packages/stack_data/core.py", line 660, in executing_piece
return only(
File "/opt/conda/lib/python3.10/site-packages/executing/executing.py", line 190, in only
raise NotOneValueFound('Expected one value, found 0')
executing.executing.NotOneValueFound: Expected one value, found 0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants