born2crawl-web provides a set of concrete implementations of the InputProcessor
that can be used for web crawling.
dependencies {
implementation("com.arthurivanets:born2crawl-web:x.y.z")
}
born2crawl-web depends on the following external dependencies:
OkHttp
- HTTP client for the JVM, Android, and GraalVM.jsoup
- the Java HTML parser, built for HTML editing, cleaning, scraping, and XSS safety.
WebPageCrawler
- a concrete implementation of theInputProcessor
that allows to crawl the web pages.FileDownloader
- a concrete implementation of theInputProcessor
that allows to download files by urls.