- upgrade to storm 2.7.2
- upgrade to storm crawler 3.2.0 (all project internal package name have changed)
- based on Java 21
- correction in query for number of unchecked links to exclude non-active links
- logging number of unchecked links (issue #84)
- dependency upgrade for linkchecker-persistence
- upgrading dependencies to Storm 2.6.1 and Storm Crawler 2.11
=> requirement to change user agent string (issues #78, #79, #80) - removing acknowledgement from MetricsFetcherBolt (issue #81)
- adding functionality for host specific http.timeout
- upgrading dependencies to Storm 2.5.0 and Storm Crawler 2.10
- moving flux and conf files to maven resource directory (requires additional command line option -R)
- set http.timeout via environment variable
- activating http.agent.email and http.agent.descripton again
- allow to configure a host specific crawl delay
- redesign of LPASpout: takes now a native SQL query as constructor parameter from crawler.flux file
- modification of crawler.flux to have two instances of LPASpout for handle- and non handle URLs
- logging crawl delays from robots.txt in MetricsFetcherBolt which are exceeding the configured fetcher.server.delay
- writing latest checking results in a Object file to be used by curation-web
- removing redundant settings from configuration file
- bug fixes
- configuring okhttp.HttpProtocol (issue #52)
- shifting status logging from MetricsFetcherBolt to LPASpout (issue #59)
- bug fix for issue #58
- upgrade of storm crawler dependeny (issue #53)
- bug fix for issue #57
- adding missing PartitionerBolt again (bug fix!)
- bugfix in class MetricsFetcherBolt to prevent null message
- bugfix in dependency linkchecker-persistence
- replacement of the persistence layer: the resource availability status API (RASA) is replaced by curation-persistence
- inclusion of maven wrapper
- deletion of template_crawler-conf.yaml and use of environment variables in crawler-conf.yaml
- upgrade to storm 2.4.0 and storm crawler 2.4
- improved algorithm for next links to check
- trimming URLs (done by RASA) and escaping white spaces in URLs used for request
- checking time taken now at the start of the checking instead of the end
- accurate control flow instead of using exceptions
- fixing bug of doubled log messages in RASASpout
- reducing log messages of MetricsFetcherBolt to one message on info-level per 100 checks
- increase size of content-length/byteSize from int to long in Java and from int to bigint in mysql/maria db
- increase size of message from varchar(256) to varchar(1024) with a truncation in Java, if the message is longer that 1024 characters
- Java version upgrade to Java 11
- dependency upgerade to storm 2.2
- dependency upgrade to storm crawler 2.1
- using resource-availability-status-api for db access in RASASpout and StatusUpdaterBolt instead of direct db access
- storing originalUrl in metadata which allows the use of unmodified storm crawler Abstract super-classes
- using storm crawler's SimpleURLBuffer instead of LinkedList to handle tuples