Skip to content
Change the repository type filter

All

    Repositories list

    • Testbed for some netarchivesuite docker experiments
      Jinja
      5201Updated Oct 14, 2024Oct 14, 2024
    • Netarchivesuite 5.X development
      Java
      Other
      23183121Updated Oct 14, 2024Oct 14, 2024
    • crawlrss

      Public
      Crawl RSS - Heritrix 3 add-on
      Java
      Other
      6000Updated Oct 14, 2024Oct 14, 2024
    • heritrix3

      Public
      Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.
      Java
      Other
      764000Updated Oct 14, 2024Oct 14, 2024
    • A search interface and wayback machine for the UKWA Solr based warc-indexer framework.
      Java
      Apache License 2.0
      21102703Updated Aug 5, 2024Aug 5, 2024
    • Small wrapper to start/stop and communicate with Heritrix 3.
      Java
      Apache License 2.0
      2317Updated Jul 12, 2024Jul 12, 2024
    • JWAT Tools
      Java
      2521Updated Dec 13, 2023Dec 13, 2023
    • jwat

      Public
      Java Web Archive Toolkit
      Java
      2323Updated Dec 13, 2023Dec 13, 2023
    • Will process all warc-files defined in a text file with JWARC and send to a CDX-server (Outback CDX etc.) . If process is stopped and restarted it will continue from where it was.
      Java
      Apache License 2.0
      0110Updated Dec 7, 2023Dec 7, 2023
    • JWAT Tools minimal GUI version
      Java
      2001Updated Dec 5, 2023Dec 5, 2023
    • jwarc

      Public
      Java library for reading and writing WARC files with a typed API
      Java
      Apache License 2.0
      8000Updated Sep 18, 2023Sep 18, 2023
    • WARC and ARC indexing and discovery tools.
      Java
      25030Updated Aug 8, 2023Aug 8, 2023
    • Summarize Web Archive holdings using an existing SOLR index
      Shell
      Apache License 2.0
      2000Updated May 17, 2023May 17, 2023
    • so-me

      Public
      Social Media harvests
      Shell
      Apache License 2.0
      08110Updated Jan 11, 2023Jan 11, 2023
    • Using the solrwaybackrootproxy will improve playback, can redirect and fix leaked resources.
      Java
      0101Updated Sep 9, 2022Sep 9, 2022
    • webdanica

      Public
      System for finding Danish webpages outside the .dk domain
      Java
      3106Updated Jul 7, 2022Jul 7, 2022
    • Danish Royal Library customisations and modifications
      TypeScript
      GNU Affero General Public License v3.0
      33000Updated Jun 14, 2022Jun 14, 2022
    • netsearch

      Public
      Merged search-arctika and search-achon into a multi-module project
      Java
      212106Updated May 20, 2022May 20, 2022
    • heatmap

      Public
      A GitHub-inspired graph for visualising activity
      JavaScript
      33201Updated Sep 6, 2021Sep 6, 2021
    • dvenabler

      Public
      Adds DocValues to Solr index fields without full re-index
      Java
      Apache License 2.0
      18103Updated Jun 7, 2021Jun 7, 2021
    • Project to create a customised openwayback for netarkivet using maven overlays.
      Java
      0000Updated Mar 19, 2021Mar 19, 2021
    • NetarchiveSuite fork of OpenWayback
      Java
      Apache License 2.0
      276100Updated Mar 19, 2021Mar 19, 2021
    • Vagrant project to spin up a single node VM running current versions of Hadoop, Hive and Spark
      Shell
      Apache License 2.0
      58000Updated Dec 7, 2020Dec 7, 2020
    • Attempts to create hadoop jobs from other processes
      Java
      Apache License 2.0
      2000Updated Mar 27, 2020Mar 27, 2020
    • logtrix

      Public
      Java library/tool for parsing and summarising Heritrix crawl logs
      Java
      Apache License 2.0
      1100Updated May 24, 2019May 24, 2019
    • umbra

      Public
      A queue-controlled browser automation tool for improving web crawl quality
      Python
      Apache License 2.0
      25000Updated Apr 23, 2019Apr 23, 2019
    • Shell
      1200Updated Mar 29, 2019Mar 29, 2019
    • Small FITS wrapper to run it using a custom classloader and provide some basic JAXB (un)marshalling of the XML output.
      Java
      Apache License 2.0
      2000Updated Oct 2, 2018Oct 2, 2018
    • WARC writer for INAs Live Archiving Proxy
      Java
      1000Updated Aug 31, 2018Aug 31, 2018
    • Wayback resourcestore using JWAT
      Java
      1000Updated Aug 30, 2018Aug 30, 2018