Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle Epistolary Novels #137

Open
kenalba opened this issue Oct 26, 2020 · 1 comment
Open

Handle Epistolary Novels #137

kenalba opened this issue Oct 26, 2020 · 1 comment

Comments

@kenalba
Copy link
Contributor

kenalba commented Oct 26, 2020

We could build a method that takes in a Document, detects whether or not it's an epistolary novel, and then breaks the document up into a dictionary of letters (or a list of Letter objects?). We'll want to programmatically detect the writer of each letter and include that in the metadata.

Ideally, we can programmatically determine metadata for each letter - writer, date, recipient, and so on. That's going to be tricky, but maybe possible. If we combine this functionality with our hypothetical named entity recognition module (to get a character list) and a ML-based gender guesser for each character, we can do some classy stuff.

@fyang3
Copy link
Contributor

fyang3 commented Oct 27, 2020

https://www.mygreatlearning.com/blog/named-entity-recognition/. A pretty good overview for Named entity recognition.
Microsoft Azure also has NLP modules on this: https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/named-entity-recognition
For detecting gender by names we could use NLTK or Scik-learn and build our own classifier (so we need to decide what features we'd like: https://www.geeksforgeeks.org/python-gender-identification-by-name-using-nltk/
This is an example of building up a classifier: https://gist.github.com/vinovator/6e5bf1e1bc61687a1e809780c30d6bf6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants