Releases: webrecorder/browsertrix-crawler
Browsertix Crawler 0.5.0 Beta 4
- Update to py-wacz 0.4.3, more tolerant of pages with invalid full text search data (skips pages instead of fails wacz creation)
- Support for
scopeType: domain
and include http/https pages in scope by default
Browsertix Crawler 0.5.0 Beta 3
Various fixes, including:
- Screencasting refactor, support screencast via redis, add new 'init' message
- Support for retrying pending URLs after a limited amount of time
- Redis: load queues gracefully to avoid large redis data load
Browsertix Crawler 0.5.0 Beta 2
Add support for WACZ signing (experimental), enabled via WACZ_SIGN_URL and WACZ_SIGN_TOKEN env vars.
Browsertix Crawler 0.5.0 Beta 1
Support for uploading WACZ to S3-compatible storage!
Browsertrix Crawler 0.5.0 Beta 0
Initial Build of 0.5.0 beta for testing!
Browsertrix Crawler 0.4.4
This release includes fixes block rules system and README improvements:
- Page Block Rules Fix: 'request already handled' errors by avoiding adding duplicate handlers to same page.
- Page Block Rules Fix: await all continue/abort() calls and catch errors.
- Page Block Rules: Don't apply to top-level page, print warning and recommend scope rules instead.
- Setup: Attempt to create the crawl working directory (cwd) specified via --cwd if it doesn't exist.
- Scope Types: Rename 'none' -> 'page' (single page only) and 'page' -> 'page-spa' (page with hashtags).
- README: Add more scope rule examples, clarify distinction between scope rules and block rules.
- README: Update old type -> scopeType, list new scope types.
Browsertrix Crawler 0.4.3
This release includes a bug fix for the 'block rules' system:
- When considering the 'inFrameUrl' for a navigation request for an iframe, use URL of parent frame.
- Always allow pywb proxy static scripts, ignoring block rules settings.
- When 'debug' set in 'logging' options, log blocked requests and conditional iframe requests.
Browsertrix Crawler 0.4.2
This releases includes the following fixes:
- Compose/docs: Build latest image by default, update README to refer to latest image
- Fix typo in
crawler.capturePrefix
that resulted indirectFetchCapture()
always failing (also catch any fails in direct fetch) - Tests: Update all tests to use
test-crawls
directory - extractLinks() just extracts links from default selectors, allows custom driver to filter results
- loadPage() accepts a list of selector options with selector, extract, and isAttribute settings for further customization of link extraction
Released image published to Docker Hub at webrecorder/browsertrix-crawler:0.4.2
Browsertix Crawler 0.4.1
This release includes a multi-platform build for amd64 and arm64 (Apple M1).
Other fixes and enhancements include:
- BlockRules Optimizations: don't intercept requests if no blockRules
- Profile Creation: Support extending existing profile by passing a --profile param to load on startup
- Profile Creation: Set default window size to 1600x900, add --windowSize param for setting custom size
- Behavior Timeouts: Add --behaviorTimeout to specify custom timeout for behaviors, in seconds (defaulting to 90 seconds)
- Load Wait Default: Switch to 'load,networkidle2' to speed-up waiting for initial load
- Multi-platform build: Support building for amd64 and Arm using oldwebtoday/chrome:91 images (check for google-chrome and chromium-browser automatically)
- CI: Build a multi-platform (amd64 and arm64) image on each release
Browsertix Crawler 0.4.1 Beta 1
[Testing Multi-platform building]
(Beta) Changes for 0.4.1
BlockRules Optimizations: don't intercept requests if no blockRules
Profile Creation: Support extending existing profile by passing a --profile param to load on startup
Behavior Timeouts: Add --behaviorTimeout to specify custom timeout for behaviors, in seconds (defaulting to 90 seconds)
Load Wait Default: Switch to 'load,networkidle2' to speed-up waiting for initial load
Multi-platform build: Support building for amd64 and Arm using oldwebtoday/chrome:91 images (check for google-chrome and chromium-browser automatically)
CI: Builds an amd64 and arm64 images on each release