You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have handwritten a Transformer model that includes three parts: self-attention, cross-attention, and MLP. It can run on the NPU, but when I run only the cross-attention part, the following problem occurs.
RuntimeError: Exception from src/inference/src/cpp/infer_request.cpp:223:
Exception from src/plugins/intel_npu/src/backend/include/zero_utils.hpp:21:
L0 zeFenceHostSynchronize result: ZE_RESULT_ERROR_UNKNOWN, code 0x7ffffffe - an action is required to complete the desired operation
Step-by-step reproduction
my cross_attention code is here:
class cross_block(nn.Module):
def __init__(self, hidden_size=1200, num_heads=16):
super(cross_block, self).__init__()
self.head_dim = hidden_size // num_heads
self.dim = hidden_size
self.d_model = hidden_size
self.num_heads = num_heads
self.mha = nn.MultiheadAttention(embed_dim=self.d_model, num_heads=self.num_heads)
def cross_attn(self, q, k, v):
N,B,C = q.shape
x, output_weights = self.mha(q, k, v)
x = x.view(2, N//2, C) # just for testing
return x
def forward(self, q: torch.Tensor, k: torch.Tensor, v: torch.Tensor) -> torch.Tensor:
return self.cross_attn(q, k, v)
Traceback (most recent call last):
File "/home/mla/model.py", line 50, in<module>
t = compiled_model(example_input)
File "/home/xxx/anaconda3/envs/env1/lib/python3.10/site-packages/openvino/runtime/ie_api.py", line 388, in __call__
return self._infer_request.infer(
File "/home/xxx/anaconda3/envs/env1/lib/python3.10/site-packages/openvino/runtime/ie_api.py", line 132, in infer
returnOVDict(super().infer(_data_dispatch(
RuntimeError: Exception from src/inference/src/cpp/infer_request.cpp:223:
Exception from src/plugins/intel_npu/src/backend/include/zero_utils.hpp:21:
L0 zeFenceHostSynchronize result: ZE_RESULT_ERROR_UNKNOWN, code 0x7ffffffe - an action is required to complete the desired operation
Issue submission checklist
I'm reporting an issue. It's not a question.
I checked the problem with the documentation, FAQ, open issues, Stack Overflow, etc., and have not found a solution.
There is reproducer code and related data files such as images, videos, models, etc.
The text was updated successfully, but these errors were encountered:
@Zctoylm0927 thanks for reaching out, do you observe the same behavior on the latest 2024.4 release or nightly release? If you can please share minimal sample reproducer and IR model. Also provide the NPU driver version you are using.
Thanks for reply. I have tried 2024.4 release,
And still the same mistake.
I only shared the xml file before, now I upload the bin file together. cross.zip
I think my NPU driver version is v1.6.0 cause it matches release date.
OpenVINO Version
2024.3
Operating System
Ubuntu 20.04 (LTS)
Device used for inference
NPU
Framework
PyTorch
Model used
torch.nn.MultiheadAttention
Issue description
I have handwritten a Transformer model that includes three parts: self-attention, cross-attention, and MLP. It can run on the NPU, but when I run only the cross-attention part, the following problem occurs.
RuntimeError: Exception from src/inference/src/cpp/infer_request.cpp:223:
Exception from src/plugins/intel_npu/src/backend/include/zero_utils.hpp:21:
L0 zeFenceHostSynchronize result: ZE_RESULT_ERROR_UNKNOWN, code 0x7ffffffe - an action is required to complete the desired operation
Step-by-step reproduction
my cross_attention code is here:
And followed by my convert code:
When I try to use the ov cross block, the problem occurs:
t = compiled_model(example_input)
But I use the original model, there is no such problem. And here is my cross block xml.
cross.xml.txt
Relevant log output
Issue submission checklist
The text was updated successfully, but these errors were encountered: