Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

如何得到数据集经过syntax-guided encoder编码后的句法向量? #37

Open
wrq9 opened this issue Sep 11, 2024 · 3 comments
Open

Comments

@wrq9
Copy link

wrq9 commented Sep 11, 2024

您好,请问如果我只想得到将数据集中的数据经过GOPar解析,并用syntax-guided encoder编码后的向量,应该如何操作?

@HillZhang1999
Copy link
Owner

可以hack一下fairseq的代码,比如在这里拿到:
https://github.com/HillZhang1999/SynGEC/blob/main/src/src_syngec/syngec_model/syntax_enhanced_transformer.py#L562

@wrq9
Copy link
Author

wrq9 commented Sep 18, 2024

可以hack一下fairseq的代码,比如在这里拿到: https://github.com/HillZhang1999/SynGEC/blob/main/src/src_syngec/syngec_model/syntax_enhanced_transformer.py#L562

感谢回复,还想请教一下代码中的src_outcoming_arc_mask, src_incoming_arc_mask, src_dpd_matrix, src_probs_matrix这些是如何得到的?

@wrq9
Copy link
Author

wrq9 commented Sep 27, 2024

您好,当我使用emnlp2022_syngec_biaffine-dep-electra-zh-gopar时解析后的标签能正确得到M(Missing errors), R(Redundant errors), S(Substituted errors),而使用char时却无法得到M, R, S这三个标签,请问是什么原因?

import os
os.environ["HF_ENDPOINT"] = "https://hf-mirror.com"
from supar import Parser

# path = '../emnlp2022_syngec_biaffine-dep-electra-zh-char'
path = '../emnlp2022_syngec_biaffine-dep-electra-zh-gopar'
dep = Parser.load(path)

tree = dep.predict("今天是星期。", verbose=False, buckets=32, prob=True)
print(f'arcs: {tree.arcs[0]}')
print(f'rels: {tree.rels[0]}')

使用gopar的输出:

arcs: [2, 3, 0, 5, 3, 3]
rels: ['app', 'top', 'root', 'app', 'attr', 'M']

使用char的输出:

arcs: [2, 3, 0, 5, 3, 3]
rels: ['app', 'top', 'root', 'app', 'attr', 'punct']

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants