-
Notifications
You must be signed in to change notification settings - Fork 10
technologyNutch
Lukas Schmelzeisen edited this page Aug 5, 2013
·
13 revisions
-
batchId
When generating URLs to be fetched later, a batchId can be assigned to a batch of generated URLs. This allows you to first generate multiple batches of URLs, and then fetch them later one after another without having to wait for one big fetch to finish.
-
crawlId
Identifier that describes a crawl. Might it be useful to just use timestamps to generate crawlIds?
- Nutch2Crawling: describes the nutch crawl jobs: generate, fetch, parse, updatedb.
- Understanding the columns/fields in Nutch 2.0 Webpage