Scraper - Tag content type #26

georgerichardson · 2017-02-05T03:14:55Z

During scraping, can we tag whether something is text/video/image/pdf. Extra dessert if you can discern between news/blog etc.

simonb83 · 2017-02-07T22:27:22Z

For video / image, if there is no accompanying text on the page, we are likely to end up tagging the link as not relevant as the idea is to base this upon whether or not it talks about the reporting terms.

So in both these cases I feel that these pages are likely to be mixed content?

ghost · 2017-02-17T00:56:45Z

You guys ever considered Alchemy Data News as a source? You can have it return rss feeds from sites if present as well.

georgerichardson · 2017-02-19T03:28:22Z

Thanks @CL34R. Checking that out.

georgerichardson · 2017-05-04T18:02:45Z

Latest approach in classification notebook

georgerichardson added the data-collection label Feb 5, 2017

georgerichardson added this to the scraper + pipeline v0.1 milestone Feb 5, 2017

georgerichardson removed this from the scraper + pipeline v0.1 milestone Feb 8, 2017

georgerichardson added the scraper label Feb 19, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scraper - Tag content type #26

Scraper - Tag content type #26

georgerichardson commented Feb 5, 2017

simonb83 commented Feb 7, 2017

ghost commented Feb 17, 2017

georgerichardson commented Feb 19, 2017

georgerichardson commented May 4, 2017

Scraper - Tag content type #26

Scraper - Tag content type #26

Comments

georgerichardson commented Feb 5, 2017

simonb83 commented Feb 7, 2017

ghost commented Feb 17, 2017

georgerichardson commented Feb 19, 2017

georgerichardson commented May 4, 2017