Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scraper - Tag content type #26

Open
georgerichardson opened this issue Feb 5, 2017 · 4 comments
Open

Scraper - Tag content type #26

georgerichardson opened this issue Feb 5, 2017 · 4 comments

Comments

@georgerichardson
Copy link

During scraping, can we tag whether something is text/video/image/pdf. Extra dessert if you can discern between news/blog etc.

@simonb83
Copy link
Collaborator

simonb83 commented Feb 7, 2017

For video / image, if there is no accompanying text on the page, we are likely to end up tagging the link as not relevant as the idea is to base this upon whether or not it talks about the reporting terms.

So in both these cases I feel that these pages are likely to be mixed content?

@georgerichardson georgerichardson removed this from the scraper + pipeline v0.1 milestone Feb 8, 2017
@ghost
Copy link

ghost commented Feb 17, 2017

You guys ever considered Alchemy Data News as a source? You can have it return rss feeds from sites if present as well.

@georgerichardson
Copy link
Author

Thanks @CL34R. Checking that out.

@georgerichardson
Copy link
Author

Latest approach in classification notebook

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants