Skip to content

Latest commit

 

History

History
44 lines (28 loc) · 1.41 KB

README.md

File metadata and controls

44 lines (28 loc) · 1.41 KB

🔍 WIKIDUMP SEARCH

It is an offline utility/tool made for searching 'keywords' in Wikipedia Archive instead of using any online WikipediaAPI.


🎯 BENEFITS

  • when you need to search for many 'keywords' in Wikipedia. WikipediaAPI such as Wikipedia may slow down after few dozens of calls.
  • if your internet connection is not fast, then this is beneficial as it is an offline search.
  • uses very minimal onboard resource.

🛠️ REQUIREMENTS

or you can install using pip install -r "./requirements.txt"

Also, you need to download one image/backup from this wiki-archive page


⚙️ SETUP

Download

  • enwiki-{data}-pages-articles-multistream.xml.bz2 (~23 GB)
  • enwiki-{date}-pages-articles-multistream-index.txt.bz2 (~250 MB)
    • Extract this file. It will contain enwiki-{date}-pages-articles-multistream-index.txt (~1.2 GB)

These file's filepaths will be required when initializing thhe offline wiki class


📝 EXAMPLE

See testing.ipynb