Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Named entity recognition #113

Open
kenalba opened this issue Oct 2, 2020 · 2 comments
Open

Named entity recognition #113

kenalba opened this issue Oct 2, 2020 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@kenalba
Copy link
Contributor

kenalba commented Oct 2, 2020

A useful feature for performing analysis that is tied less to gendered pronouns is to use proper names. This would allow a user to see adjectives used to describe a particular character, or a particular family of characters.

A naïve solution to this might just search for words whose first letter is capitalized and that don't show up in a dictionary, though I suspect we'll need a more robust algorithm to make this usable. We might also be able to use our POS tagger to get us part of the way there. There are open source approaches to the problem; it seems like spaCy might be able to do what we want, here.

@kenalba kenalba added the enhancement New feature or request label Oct 2, 2020
@kenalba kenalba changed the title Proper (character) name detection Named entity recognition Oct 2, 2020
@fyang3 fyang3 self-assigned this Oct 27, 2020
@fyang3
Copy link
Contributor

fyang3 commented Oct 27, 2020

An issue: a character in a novel might have more than 1 name. For instance, Emma Woodhouse could also be called Emma. If we just want to identify all the names there are in a sentence, then it should not be difficult; yet we do need to acknowledge the fact that there are multiple "identities" for a character

@kenalba
Copy link
Contributor Author

kenalba commented Oct 28, 2020

A very good point! Collapsing multiple 'nicknames' into a single entity is a nontrivial task. If it seems possible, though, we should consider looking into it.

It might mean creating a Character class that has a series of other names associated with them. This approach would pay particular dividends were we to look into, say, fanfiction, where the same character might show up in multiple novels. Worth brainstorming about, for sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants