Burrow

Because gophers don't crawl. They burrow.

An experiment in writing a document crawler in Go.

The API aims to be expressive yet succinct. Take for example the task of crawling through html documents using anchor hrefs:

crawl.Through(urlsUsingAnchor).BeginWith(seedUrls, crawledUrlSink)

Note that the current example sever implementation does not persist crawled entities to disk but rather keeps a pool of urls and polls and removes them as the reqeust multiplexer deems fit. Therefore if scalibility is a concern and you expect more than urlSinkSize concurrent requests I would recommend using an actual crawling engine.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Burrow

Files

README.md

Latest commit

History

README.md

File metadata and controls

Burrow