t5 notebook broken with transformer-deploy 0.5.0 #130

michaelroyzen · 2022-08-20T21:25:37Z

Running the t5.ipynb notebook is broken when using the transformers-deploy 0.5.0 docker container. Specifically, with

def get_random_input_encoder() -> Dict[str, torch.Tensor]:
    max_seq = 128
    seq_len = random.randint(a=1, b=max_seq)
    batch = max_seq // seq_len
    random_input_ids = torch.randint(
        low=0, high=tokenizer.vocab_size, size=(batch, seq_len), dtype=torch.int32, device="cuda"
    )
    inputs = {"input_ids": random_input_ids}
    return inputs


keep_fp32_encoder = get_keep_fp32_nodes(onnx_model_path=encoder_model_path, get_input=get_random_input_encoder)
assert len(keep_fp32_encoder) > 0
enc_model_onnx = convert_fp16(onnx_model=encoder_model_path, nodes_to_exclude=keep_fp32_encoder)
save_onnx(proto=enc_model_onnx, model_path=encoder_fp16_model_path, clean=False)

del enc_model_onnx
torch.cuda.empty_cache()
gc.collect()

I get

InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Deserialize tensor onnx::MatMul_2637 failed.corrupted protobuf data: tensor shape size(4194304) does not match the data size(0) in proto

@pommedeterresautee

The text was updated successfully, but these errors were encountered:

david-rx · 2022-08-29T01:36:07Z

Anyone have a workaround here?

caffeinetoomuch · 2022-09-02T01:03:37Z

Running an export of encoder(T5-3b) without mixed precision also seems to give similar a similar error as the following

Traceback (most recent call last):
  ...
    encoder_onnx = create_model_for_provider(encoder_onnx_path_to_compare, "CUDAExecutionProvider", log_severity=3)
  File "/workspace/transformer-deploy/src/transformer_deploy/backends/ort_utils.py", line 85, in create_model_for_provider
    return InferenceSession(path, options, providers=provider_to_use)
  File "/home/af/.local/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 347, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "/home/af/.local/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 395, in _create_inference_session
    sess.initialize_session(providers, provider_options, disabled_optimizers)
onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Deserialize tensor embed_tokens.weight failed.corrupted protobuf data: tensor shape size(32876544) does not match the data size(0) in proto

I was using torch==1.12.1+cu116, onnx==1.12.0, onnxruntime-gpu==1.12.1 running on the same nvidia driver(515.65.01 and CUDA 11.7) as the notebook. Could we know the versions of libraries that notebook was successfully running?

michaelroyzen · 2022-09-04T06:35:48Z

I temporarily solved the issue by using the 8bfe4f5 commit of transformer-deploy and using PyTorch 1.11 as well as onnx==1.12.0 and onnxruntime==1.12.0.

michaelroyzen mentioned this issue Aug 20, 2022

t5_bf16 notebooks fails with [ONNXRuntimeError] : 10 : INVALID_GRAPH #118

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

t5 notebook broken with transformer-deploy 0.5.0 #130

t5 notebook broken with transformer-deploy 0.5.0 #130

michaelroyzen commented Aug 20, 2022 •

edited

Loading

david-rx commented Aug 29, 2022

caffeinetoomuch commented Sep 2, 2022

michaelroyzen commented Sep 4, 2022

t5 notebook broken with transformer-deploy 0.5.0 #130

t5 notebook broken with transformer-deploy 0.5.0 #130

Comments

michaelroyzen commented Aug 20, 2022 • edited Loading

david-rx commented Aug 29, 2022

caffeinetoomuch commented Sep 2, 2022

michaelroyzen commented Sep 4, 2022

michaelroyzen commented Aug 20, 2022 •

edited

Loading