常见报错信息汇总

torch.cuda.OutofMemoryError: CUDA out of memory.

报错原因：GPU 显存不足，请降低批处理大小或使用量化微调。

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!

报错原因：多卡训练没有使用 Accelerate 或 DeepSpeed 启动脚本。

RuntimeError: unscale_() has already been called on this optimizer since the last update().

报错原因：依赖库版本较低，请使用 pip install -q git+https://github.com/huggingface/peft.git git+https://github.com/huggingface/transformers.git git+https://github.com/huggingface/accelerate.git 更新依赖库。

AttributeError: 'ChatGLMTokenizer' object has no attribute ***

报错原因：本地 ChatGLM-6B 模型文件版本过旧，请重新下载最新的 ChatGLM-6B 模型。

AttributeError: 'ChatGLMModel' object has no attribute ***

报错原因：本地 ChatGLM-6B 模型文件版本过旧，请重新下载最新的 ChatGLM-6B 模型。

ValueError: Undefined dataset *** in dataset_info.json.

报错原因：指定的数据集未在 dataset_info.json 被定义。

ValueError: The fine-tuning arguments are not found in the provided dictionary.

报错原因：提供的模型断点文件夹路径有误。

RuntimeError: expected scalar type Half but found Float

报错原因：微调训练时没有指定 --fp16 参数。

ValueError: FP16 Mixed precision training with AMP or APEX can only be used on CUDA devices.

报错原因：没有找到 GPU 设备，请确保 CUDA 环境配置正确。

RuntimeError: Expected to mark a variable ready only once.

报错原因：当使用 LoRA 方法进行分布式微调时，没有指定 --ddp_find_unused_parameters False。

RuntimeError: Internal: src/sentencepiece_processor.cc(1101)

报错原因：ice_text.model 文件不存在或损坏，请重新下载。

ImportError: cannot import name 'WEIGHTS_NAME' from 'peft.utils.other'

AttributeError: 'ChatGLMForConditionalGeneration' object has no attribute 'merge_and_unload'

报错原因：peft 库的版本不是最新，请更新 peft 库。

RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'

报错原因：模型没有被正确加载到 GPU 上，请检查 CUDA 环境。

RuntimeError: Expected is_sm80 to be true, but got false

报错原因：PyTorch 版本与 CUDA 不对应，请使用 pip install --force-reinstall --pre torch --index-url https://download.pytorch.org/whl/nightly/cu117 重新安装 PyTorch。

ModuleNotFoundError: No module named 'transformers_modules.'

报错原因：1. chatglm的版本不是最新 2. 如果是指定了本地的chatglm的路径，请使用绝对路径而不是相对路径

MAC 上无法训练

报错原因：目前MAC 系统不支持训练，请使用带cuda显卡的 windows 或者linux

English Docs

Usage

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

常见报错信息汇总

torch.cuda.OutofMemoryError: CUDA out of memory.

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!

RuntimeError: unscale_() has already been called on this optimizer since the last update().

AttributeError: 'ChatGLMTokenizer' object has no attribute ***

AttributeError: 'ChatGLMModel' object has no attribute ***

ValueError: Undefined dataset *** in dataset_info.json.

ValueError: The fine-tuning arguments are not found in the provided dictionary.

RuntimeError: expected scalar type Half but found Float

ValueError: FP16 Mixed precision training with AMP or APEX can only be used on CUDA devices.

RuntimeError: Expected to mark a variable ready only once.

RuntimeError: Internal: src/sentencepiece_processor.cc(1101)

ImportError: cannot import name 'WEIGHTS_NAME' from 'peft.utils.other'

AttributeError: 'ChatGLMForConditionalGeneration' object has no attribute 'merge_and_unload'

RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'

RuntimeError: Expected is_sm80 to be true, but got false

ModuleNotFoundError: No module named 'transformers_modules.'

MAC 上无法训练

English Docs

Usage

中文文档

使用方法

常见报错信息汇总

Clone this wiki locally