-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update PySBD component to support spaCy v3 #114
base: master
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## master #114 +/- ##
==========================================
- Coverage 98.43% 98.35% -0.09%
==========================================
Files 38 39 +1
Lines 1150 1153 +3
==========================================
+ Hits 1132 1134 +2
- Misses 18 19 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more |
}, | ||
entry_points={ | ||
"spacy_factories": ["pysbd = pysbd.utils:PySBDFactory"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rmitsch I would have to remove this entrypoint now as spacy uses @Language.factory
decorator compulsorily in spacy v3 to register a custom component and since PySBDFactory
resides at pysbd/utils.py, I would need to add spacy>=3
requirement to pysbd's setup.py
I wish to keep pysbd
lightweight (use only inbuilt python modules).
Do you have any thoughts on this? Like other way around?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, that's tricky. You could have a look at vendoring @Language.factory
. You'd definitely need the registry functionality which can be found in https://github.com/explosion/catalogue now. It's still relatively lightweight, but it's already breaking your requirement of only having inbuilt Python modules.
How's spacy_factories
used within PSBD?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not used in pysbd
.
psybd
python library is shipping psybd
named spaCy component out-of-the-box via entrypoints.
Given a python environment with spacy
and pysbd
installed, nlp.add_pipe("pysbd")
will work without importing pysbd
explicitly.
More info here: https://spacy.io/usage/saving-loading#entry-points-components
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively: you could offer pybsd
and pybsd[spacy]
, with only the latter supporting the usage as a spaCy v3.x component and installing spaCy by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, was thinking of doing this. Will look into it 👍🏼
|
||
doc = nlp(text) | ||
print('sent_id', 'sentence', sep='\t|\t') | ||
for sent_id, sent in enumerate(doc.sents, start=1): | ||
print(sent_id, sent.text, sep='\t|\t') | ||
print(sent_id, repr(sent.text), sep='\t|\t') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity: why is the repr()
necessary here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
}, | ||
entry_points={ | ||
"spacy_factories": ["pysbd = pysbd.utils:PySBDFactory"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, that's tricky. You could have a look at vendoring @Language.factory
. You'd definitely need the registry functionality which can be found in https://github.com/explosion/catalogue now. It's still relatively lightweight, but it's already breaking your requirement of only having inbuilt Python modules.
How's spacy_factories
used within PSBD?
Are you still working on this? Otherwise I could have a look. |
Hey @davidberenstein1957, sure you can take a look at it. Let me know if you happen to work on the recommendations suggested by @rmitsch above. |
here would be an option to update the factory method and not require spacey as a hard requirement to pysbd.
` |
PySBD component using Language.factory