Skip to content

ganapativs/puppeteer-warc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Puppeteer WARC

This project demonstrates how to use Puppeteer to render a web page and create a WARC file of the rendered page and its resources. This can be useful for archiving web pages for long-term storage or offline browsing.

Requirements

  • Node.js: Ensure you have Node.js installed (version 22 or higher is recommended). You can download it from nodejs.org.

Installation

  1. Clone the repository: This step involves downloading the project files to your local machine.

    git clone https://github.com/ganapativs/puppeteer-warc.git
    cd puppeteer-warc
  2. Install the necessary dependencies: This command will install all the required Node.js packages specified in the package.json file.

    npm install

Usage

Writing a WARC File

To create a WARC file from a website, use the src/write-warc-cli.mjs script. This script will render the specified website and create a WARC file containing the page and its resources. It will also generate a screenshot of the web page, which can be useful for debugging.

  • Command:

    node src/write-warc-cli.mjs <website-url>
    • Example: To create a WARC file for https://example.com, run:

      node src/write-warc-cli.mjs https://example.com

Reading a WARC File

To read and print the contents of a WARC file, use the src/read-warc-cli.mjs script. This script will output the records contained in the specified WARC file.

  • Command:

    node src/read-warc-cli.mjs <path-to-warc-file>
    • Example: To read the contents of examplecom.warc.gz, run:

      node src/read-warc-cli.mjs examplecom.warc.gz

Previewing WARC Files

You can preview WARC files using ReplayWeb.page, a web-based tool for viewing archived web content. This tool allows you to interact with the archived pages as if you were browsing them live.

License

This project is licensed under the MIT License. See the LICENSE file for details.

About

Create WARC (Web ARChive) of a web page

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published