-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WeNet最新代码无法跑通 #2015
Comments
执行s0的代码也无法跑通 |
#2009 有可能是这个pr引起的 |
好的,谢谢
…---原始邮件---
发件人: "Xingchen ***@***.***>
发送时间: 2023年9月18日(周一) 上午8:28
收件人: ***@***.***>;
抄送: "Daobin ***@***.******@***.***>;
主题: Re: [wenet-e2e/wenet] WeNet最新代码无法跑通 (Issue #2015)
#2009 有可能是这个pr引起的
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
改通了可以提个pr |
I've just pull this newest code. And accidently cannot run wenet as well. |
@gengxuelong 可以fix一下jit导出错误吗,应该是最新的pad_list引起的 |
好的, 我对jit还不是很熟悉, 我尽力尝试一下 |
xingchensong
pushed a commit
that referenced
this issue
Sep 19, 2023
* [fix] 修复utils/common.py中pad_list未考虑time维度后可跟其他维度的情况 * [fix] 修复utils/common.py中pad_list未考虑time维度后可跟其他维度的情况(#2007) * [fix] 修复jit报错,初步判断该爆错由`*(xs[0].shape[1:])`代码表示的动态张量引起,现修改common.py/pad_list的注释, 暂时不考虑time维度后可跟其他维度, 先让代码恢复可运行状态 (issue #2015) * [fix] 完全修复jit报错,在jit要求条件下实现time维度后可跟其他维度(#2015) * [fix] 完全修复jit报错,在jit要求条件下实现time维度后可跟其他维度(#2015) * [fix] 完全修复jit报错,在jit要求条件下实现time维度后可跟其他维度(#2015)
fixed,#2018 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
使用最新代码,默认配置,执行步骤4报错如下:
the number of model params: 53,006,116
/home/lsj/.conda/envs/wenet/lib/python3.8/site-packages/torch/jit/_check.py:181: UserWarning: The TorchScript type system doesn't support instance-level annotations on empty non-base types in
__init__
. Instead, either 1) use a type annotation in the class body, or 2) wrap the type intorch.jit.Attribute
.warnings.warn("The TorchScript type system doesn't support "
Traceback (most recent call last):
File "wenet/bin/train.py", line 448, in
main()
File "wenet/bin/train.py", line 283, in main
script_model = torch.jit.script(model)
File "/home/lsj/.conda/envs/wenet/lib/python3.8/site-packages/torch/jit/_script.py", line 1286, in script
return torch.jit._recursive.create_script_module(
File "/home/lsj/.conda/envs/wenet/lib/python3.8/site-packages/torch/jit/_recursive.py", line 476, in create_script_module
return create_script_module_impl(nn_module, concrete_type, stubs_fn)
File "/home/lsj/.conda/envs/wenet/lib/python3.8/site-packages/torch/jit/_recursive.py", line 542, in create_script_module_impl
create_methods_and_properties_from_stubs(concrete_type, method_stubs, property_stubs)
File "/home/lsj/.conda/envs/wenet/lib/python3.8/site-packages/torch/jit/_recursive.py", line 393, in create_methods_and_properties_from_stubs
concrete_type._create_methods_and_properties(property_defs, property_rcbs, method_defs, method_rcbs, method_defaults)
File "/home/lsj/.conda/envs/wenet/lib/python3.8/site-packages/torch/jit/_recursive.py", line 894, in compile_unbound_method
create_methods_and_properties_from_stubs(concrete_type, (stub,), ())
File "/home/lsj/.conda/envs/wenet/lib/python3.8/site-packages/torch/jit/_recursive.py", line 393, in create_methods_and_properties_from_stubs
concrete_type._create_methods_and_properties(property_defs, property_rcbs, method_defs, method_rcbs, method_defaults)
File "/home/lsj/.conda/envs/wenet/lib/python3.8/site-packages/torch/jit/_recursive.py", line 863, in try_compile_fn
return torch.jit.script(fn, _rcb=rcb)
File "/home/lsj/.conda/envs/wenet/lib/python3.8/site-packages/torch/jit/_script.py", line 1343, in script
fn = torch._C._jit_script_compile(
File "/home/lsj/.conda/envs/wenet/lib/python3.8/site-packages/torch/jit/_recursive.py", line 863, in try_compile_fn
return torch.jit.script(fn, _rcb=rcb)
File "/home/lsj/.conda/envs/wenet/lib/python3.8/site-packages/torch/jit/_script.py", line 1343, in script
fn = torch._C.jit_script_compile(
RuntimeError:
cannot statically infer the expected size of a list in this context:
File "/home/lsj/zdb/wenet-new/wenet/wenet/utils/common.py", line 48
max_len = max([len(item) for item in xs])
batchs = len(xs)
pad_res = torch.zeros(batchs, max_len, *(xs[0].shape[1:]),
~~~~~~~~~~~~~~~~ <--- HERE
dtype=xs[0].dtype, device=xs[0].device)
pad_res.fill(pad_value)
'pad_list' is being compiled since it was called from 'add_sos_eos'
File "/home/lsj/zdb/wenet-new/wenet/wenet/utils/common.py", line 133
ys_in = [torch.cat([_sos, y], dim=0) for y in ys]
ys_out = [torch.cat([y, _eos], dim=0) for y in ys]
return pad_list(ys_in, eos), pad_list(ys_out, ignore_id)
~~~~~~~~~~~~~~~~~~~ <--- HERE
'add_sos_eos' is being compiled since it was called from 'Transducer._calc_att_loss'
File "/home/lsj/zdb/wenet-new/wenet/wenet/transformer/asr_model.py", line 144
ys_pad_lens: torch.Tensor,
) -> Tuple[torch.Tensor, float]:
ys_in_pad, ys_out_pad = add_sos_eos(ys_pad, self.sos, self.eos,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
self.ignore_id)
~~~~~~~~~~~~~~ <--- HERE
ys_in_lens = ys_pad_lens + 1
'Transducer._calc_att_loss' is being compiled since it was called from 'Transducer.forward'
File "/home/lsj/zdb/wenet-new/wenet/wenet/transducer/transducer.py", line 129
loss_att: Optional[torch.Tensor] = None
if self.attention_decoder_weight != 0.0 and self.decoder is not None:
loss_att, _ = self._calc_att_loss(encoder_out, encoder_mask, text,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
text_lengths)
~~~~~~~~~~~~ <--- HERE
Traceback (most recent call last):
File "wenet/bin/train.py", line 448, in
main()
File "wenet/bin/train.py", line 310, in main
model = torch.nn.parallel.DistributedDataParallel(
File "/home/lsj/.conda/envs/wenet/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 655, in init
_verify_param_shape_across_processes(self.process_group, parameters)
File "/home/lsj/.conda/envs/wenet/lib/python3.8/site-packages/torch/distributed/utils.py", line 112, in _verify_param_shape_across_processes
return dist._verify_params_across_processes(process_group, tensors, logger)
RuntimeError: [../third_party/gloo/gloo/transport/tcp/pair.cc:598] Connection closed by peer [183.175.12.69]:47490
Traceback (most recent call last):
File "wenet/bin/train.py", line 448, in
main()
File "wenet/bin/train.py", line 310, in main
model = torch.nn.parallel.DistributedDataParallel(
File "/home/lsj/.conda/envs/wenet/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 655, in init
_verify_param_shape_across_processes(self.process_group, parameters)
File "/home/lsj/.conda/envs/wenet/lib/python3.8/site-packages/torch/distributed/utils.py", line 112, in _verify_param_shape_across_processes
return dist._verify_params_across_processes(process_group, tensors, logger)
RuntimeError: [../third_party/gloo/gloo/transport/tcp/pair.cc:598] Connection closed by peer [183.175.12.69]:15265
Traceback (most recent call last):
File "wenet/bin/train.py", line 448, in
main()
File "wenet/bin/train.py", line 310, in main
model = torch.nn.parallel.DistributedDataParallel(
File "/home/lsj/.conda/envs/wenet/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 655, in init
_verify_param_shape_across_processes(self.process_group, parameters)
File "/home/lsj/.conda/envs/wenet/lib/python3.8/site-packages/torch/distributed/utils.py", line 112, in _verify_param_shape_across_processes
return dist._verify_params_across_processes(process_group, tensors, logger)
RuntimeError: [../third_party/gloo/gloo/transport/tcp/pair.cc:598] Connection closed by peer [183.175.12.69]:22442
Traceback (most recent call last):
File "wenet/bin/train.py", line 448, in
main()
File "wenet/bin/train.py", line 310, in main
model = torch.nn.parallel.DistributedDataParallel(
File "/home/lsj/.conda/envs/wenet/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 655, in init
_verify_param_shape_across_processes(self.process_group, parameters)
File "/home/lsj/.conda/envs/wenet/lib/python3.8/site-packages/torch/distributed/utils.py", line 112, in _verify_param_shape_across_processes
return dist._verify_params_across_processes(process_group, tensors, logger)
RuntimeError: [../third_party/gloo/gloo/transport/tcp/pair.cc:598] Connection closed by peer [183.175.12.69]:11158
Traceback (most recent call last):
File "wenet/bin/train.py", line 448, in
main()
File "wenet/bin/train.py", line 310, in main
model = torch.nn.parallel.DistributedDataParallel(
File "/home/lsj/.conda/envs/wenet/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 655, in init
_verify_param_shape_across_processes(self.process_group, parameters)
File "/home/lsj/.conda/envs/wenet/lib/python3.8/site-packages/torch/distributed/utils.py", line 112, in _verify_param_shape_across_processes
return dist._verify_params_across_processes(process_group, tensors, logger)
RuntimeError: [../third_party/gloo/gloo/transport/tcp/pair.cc:598] Connection closed by peer [183.175.12.69]:15235
Traceback (most recent call last):
File "wenet/bin/train.py", line 448, in
main()
File "wenet/bin/train.py", line 310, in main
model = torch.nn.parallel.DistributedDataParallel(
File "/home/lsj/.conda/envs/wenet/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 655, in init
_verify_param_shape_across_processes(self.process_group, parameters)
File "/home/lsj/.conda/envs/wenet/lib/python3.8/site-packages/torch/distributed/utils.py", line 112, in _verify_param_shape_across_processes
return dist._verify_params_across_processes(process_group, tensors, logger)
RuntimeError: [../third_party/gloo/gloo/transport/tcp/pair.cc:598] Connection closed by peer [183.175.12.69]:34984
Traceback (most recent call last):
File "wenet/bin/train.py", line 448, in
main()
File "wenet/bin/train.py", line 310, in main
model = torch.nn.parallel.DistributedDataParallel(
File "/home/lsj/.conda/envs/wenet/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 655, in init
_verify_param_shape_across_processes(self.process_group, parameters)
File "/home/lsj/.conda/envs/wenet/lib/python3.8/site-packages/torch/distributed/utils.py", line 112, in _verify_param_shape_across_processes
return dist._verify_params_across_processes(process_group, tensors, logger)
RuntimeError: [../third_party/gloo/gloo/transport/tcp/pair.cc:598] Connection closed by peer [183.175.12.69]:56714
The text was updated successfully, but these errors were encountered: