Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

t5 notebook broken with transformer-deploy 0.5.0 #130

Open
michaelroyzen opened this issue Aug 20, 2022 · 3 comments
Open

t5 notebook broken with transformer-deploy 0.5.0 #130

michaelroyzen opened this issue Aug 20, 2022 · 3 comments

Comments

@michaelroyzen
Copy link

michaelroyzen commented Aug 20, 2022

Running the t5.ipynb notebook is broken when using the transformers-deploy 0.5.0 docker container. Specifically, with

def get_random_input_encoder() -> Dict[str, torch.Tensor]:
    max_seq = 128
    seq_len = random.randint(a=1, b=max_seq)
    batch = max_seq // seq_len
    random_input_ids = torch.randint(
        low=0, high=tokenizer.vocab_size, size=(batch, seq_len), dtype=torch.int32, device="cuda"
    )
    inputs = {"input_ids": random_input_ids}
    return inputs


keep_fp32_encoder = get_keep_fp32_nodes(onnx_model_path=encoder_model_path, get_input=get_random_input_encoder)
assert len(keep_fp32_encoder) > 0
enc_model_onnx = convert_fp16(onnx_model=encoder_model_path, nodes_to_exclude=keep_fp32_encoder)
save_onnx(proto=enc_model_onnx, model_path=encoder_fp16_model_path, clean=False)

del enc_model_onnx
torch.cuda.empty_cache()
gc.collect()

I get

InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Deserialize tensor onnx::MatMul_2637 failed.corrupted protobuf data: tensor shape size(4194304) does not match the data size(0) in proto

@pommedeterresautee

@david-rx
Copy link

Anyone have a workaround here?

@caffeinetoomuch
Copy link

Running an export of encoder(T5-3b) without mixed precision also seems to give similar a similar error as the following

Traceback (most recent call last):
  ...
    encoder_onnx = create_model_for_provider(encoder_onnx_path_to_compare, "CUDAExecutionProvider", log_severity=3)
  File "/workspace/transformer-deploy/src/transformer_deploy/backends/ort_utils.py", line 85, in create_model_for_provider
    return InferenceSession(path, options, providers=provider_to_use)
  File "/home/af/.local/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 347, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "/home/af/.local/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 395, in _create_inference_session
    sess.initialize_session(providers, provider_options, disabled_optimizers)
onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Deserialize tensor embed_tokens.weight failed.corrupted protobuf data: tensor shape size(32876544) does not match the data size(0) in proto

I was using torch==1.12.1+cu116, onnx==1.12.0, onnxruntime-gpu==1.12.1 running on the same nvidia driver(515.65.01 and CUDA 11.7) as the notebook. Could we know the versions of libraries that notebook was successfully running?

@michaelroyzen
Copy link
Author

I temporarily solved the issue by using the 8bfe4f5 commit of transformer-deploy and using PyTorch 1.11 as well as onnx==1.12.0 and onnxruntime==1.12.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants